Phase 1 Complete: Foundation documentation

Added comprehensive homelab documentation:

README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap

docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands

docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan

docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions

This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
This commit is contained in:
2025-11-01 00:42:34 +01:00
parent e768ccb902
commit 6cbee11482
5 changed files with 3428 additions and 0 deletions

292
docs/network-map.md Normal file
View File

@@ -0,0 +1,292 @@
# 🌐 Network Map & Topology
**Last Updated:** October 31, 2025
**Network Range:** 192.168.68.0/22
**Maintained By:** Weston
---
## 📊 Quick Reference
| Device | IP Address | Purpose |
|--------|-----------|---------|
| **TP-Link Router** | 192.168.68.1 | Gateway, DHCP, Mesh Primary |
| **Foxtrot (Gaming PC)** | 192.168.68.50 | Workstation |
| **Unraid Server (Tower)** | 192.168.68.51 | Main infrastructure |
| **PiKVM** | 192.168.68.53 | Server out-of-band management |
| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | DNS + Ad-blocking + Unbound |
| **Code-Server VM** | 192.168.68.70 | Ubuntu headless + VS Code |
| **TP-Link Mesh Node** | 192.168.71.250 | Office WiFi extender |
---
## 🗺️ Physical Network Topology
```
Internet
│ (WAN)
┌───────┴────────┐
│ TP-Link Router│
│ 192.168.68.1 │
│ (Mesh Primary) │
└───────┬────────┘
│ (LAN - Mesh Network)
┌──────────────┼──────────────┐
│ │ │
┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐
│TP-Link │ │ Unraid │ │Pi Zero │
│Mesh Node │ │ Server │ │Pi-hole │
│ .71.250 │ │ Tower │ │Unbound │
│ (Office) │ │ .68.51 │ │ .68.61 │
└────┬─────┘ └────┬─────┘ └──────────┘
│ │
┌────┼────┐ ┌────┼─────┐
│ │ │ │ │ │
┌────┴┐ ┌─┴────┐ ┌─┴──┐ │ ┌──┴────┐
│Foxtrot│Laptop│ │PiKVM│ │ │VM: │
│Gaming│(WiFi)│ │.68.53│ │ │Code │
│ PC │ │ │(Direct│ │ │Server │
│.68.50│ │ │to Svr)│ │ │.68.70 │
└──────┘ └─────┘ └──────┘ │ └───────┘
(Server VMs)
```
---
## 🖥️ Unraid Server Virtual Network
```
Physical: eth0 (2.5GbE) → bond0 → br0 (192.168.68.51)
┌────────────────────┼────────────────────┐
│ │ │
┌────┴─────┐ ┌─────┴──────┐ ┌─────┴─────┐
│ VMs │ │ Docker │ │ Tailscale │
│ │ │ │ │ VPN │
└────┬─────┘ └─────┬──────┘ └───────────┘
│ │ 100.122.220.126
│ ┌────┴─────┐
┌────┴─────┐ │ docker0 │
│Code-Srvr │ │172.17.0.1│
│ .68.70 │ └────┬─────┘
│ (Ubuntu) │ │
└──────────┘ ┌────┼────────┬──────┐
│ │ │ │
┌────┴┐ ┌─┴──┐ ┌───┴──┐ ┌─┴───┐
│open-│ │NPM │ │Gitea │ │Guac │
│webui│ │ .4 │ │ .3 │ │ .2 │
│ .5 │ └────┘ └──────┘ └─────┘
└─────┘
```
---
## 📍 Complete IP Address Table
### Infrastructure & Services
| Device/Service | IP Address | MAC | Type | Notes |
|---------------|-----------|-----|------|-------|
| **TP-Link Router** | 192.168.68.1 | - | Physical | Gateway, DHCP, primary mesh |
| **Foxtrot (Gaming PC)** | 192.168.68.50 | - | Physical | Workstation, static IP |
| **Unraid Server** | 192.168.68.51 | 58:47:ca:7b:97:b0 | Physical | Main server, static IP |
| **PiKVM** | 192.168.68.53 | - | Physical | Direct to server, management |
| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | - | Physical | DNS/ad-block/Unbound, static |
| **Code-Server VM** | 192.168.68.70 | - | Virtual | Ubuntu + VS Code, KVM/QEMU |
| **Laptop** | DHCP | - | Physical | Mobile device, WiFi |
| **TP-Link Mesh Node** | 192.168.71.250 | - | Physical | Office WiFi extender |
### Docker Containers (172.17.0.0/16)
| Container | Docker IP | Host Port | Purpose |
|-----------|-----------|-----------|---------|
| **ApacheGuacamole** | 172.17.0.2 | 4000 | Remote desktop gateway |
| **Gitea** | 172.17.0.3 | 3002, 22 | Git server |
| **NginxProxyManager** | 172.17.0.4 | 1880, 7818, 18443 | Reverse proxy |
| **open-webui** | 172.17.0.5 | 3000 | LLM interface |
| **Cloudflared** | 172.17.0.6 | 46495 | Cloudflare tunnel |
| **Vaultwarden** | 172.17.0.7 | 4743 | Password manager |
### VPN
| Service | IP | Network | Purpose |
|---------|----|---------| --------|
| **Tailscale** | 100.122.220.126 | 100.64.0.0/10 | Secure remote access |
---
## 🌐 Network Details
**Subnet:** 192.168.68.0/22
**Netmask:** 255.255.252.0
**Usable Range:** 192.168.68.1 - 192.168.71.254 (1022 hosts)
**Gateway:** 192.168.68.1
**Primary DNS:** 192.168.68.61 (Pi-hole)
**Secondary DNS:** 9.9.9.9 (Quad9)
**Broadcast:** 192.168.71.255
---
## 🔌 Port Reference Guide
### Unraid Server Ports
| Service | Port | Protocol | URL |
|---------|------|----------|-----|
| **Unraid WebUI** | 80 | HTTP | http://192.168.68.51 |
| **Unraid SSL** | 443 | HTTPS | https://192.168.68.51 |
| **SMB** | 445 | TCP | \\\\192.168.68.51 |
| **SSH** | 22 | TCP | ssh root@192.168.68.51 |
### Container Access
| Service | URL | Port | Notes |
|---------|-----|------|-------|
| **open-webui** | http://192.168.68.51:3000 | 3000 | LLM chat interface |
| **Gitea** | http://192.168.68.51:3002 | 3002 | Git web UI |
| **Gitea (domain)** | https://gitea.segelschiff.app | 443 | Via Cloudflare |
| **NPM Web** | http://192.168.68.51:1880 | 1880 | Proxy frontend |
| **NPM Admin** | http://192.168.68.51:7818 | 7818 | Management UI |
| **Guacamole** | http://192.168.68.51:4000 | 4000 | Remote desktop |
| **Vaultwarden** | http://192.168.68.51:4743 | 4743 | Password vault |
### Infrastructure Access
| Service | URL | Default Port |
|---------|-----|--------------|
| **PiKVM** | https://192.168.68.53 | 443 |
| **Pi-hole Admin** | http://192.168.68.61/admin | 80 |
| **Code-Server** | http://192.168.68.70:8080 | 8080 (typical) |
---
## 🛡️ DNS Configuration
**Primary:** Pi-hole (192.168.68.61)
- Ad-blocking
- Local DNS records
- Query logging
- DHCP relay
**Upstream:** Unbound (same device)
- Recursive DNS resolver
- No forwarding to ISP
- Privacy-focused
- DNSSEC validation
**Resolution Flow:**
```
Client → Pi-hole (192.168.68.61) → Unbound → Root Servers
```
**Fallback:** 9.9.9.9 (Quad9) - Privacy-respecting public DNS
---
## 🌐 Remote Access
### Cloudflare Tunnel
```
Internet → Cloudflare Edge → Tunnel → NPM → Services
```
- **Domain:** *.segelschiff.app
- **Services Exposed:** Gitea (and others via NPM)
- **Benefits:** No open ports, DDoS protection, SSL
- **Container:** Cloudflared (172.17.0.6)
### Tailscale VPN
```
Remote Device → Encrypted Tunnel → Unraid (100.122.220.126)
```
- **Network:** 100.64.0.0/10 (CGNAT)
- **Protocol:** WireGuard
- **Benefits:** Zero-trust, peer-to-peer, NAT traversal
- **Access:** Full homelab as if local
---
## 📊 Network Performance
| Link | Capacity | Usage | Status |
|------|----------|-------|--------|
| **Unraid NIC** | 2.5 Gbps | <1% | Underutilized |
| **Mesh Backhaul** | Unknown | Unknown | Check model specs |
| **Internet WAN** | Unknown | Unknown | ISP dependent |
**Observed (eth0):** ~2 Mbps average = 0.08% of 2.5G capacity
---
## 🔧 Troubleshooting Commands
### Connectivity Tests
```bash
# Test key infrastructure
ping 192.168.68.1 # Router
ping 192.168.68.51 # Unraid
ping 192.168.68.61 # Pi-hole
ping 192.168.68.70 # Code-Server VM
ping 8.8.8.8 # Internet
# DNS tests
nslookup google.com 192.168.68.61 # Test Pi-hole
dig @192.168.68.61 example.com # Detailed DNS query
```
### Network Status (from Unraid)
```bash
# Interfaces
ip addr show
ip link show
# Routes
ip route show
# Active connections
ss -tulpn
# Docker networks
docker network ls
docker network inspect bridge
```
### VM Network (Code-Server)
```bash
# List VMs
virsh list --all
# Get VM IP
virsh domifaddr <vm-name>
# VM network info
virsh net-info default
```
---
## 📝 Recommendations
### Security
1. ⚠️ **Separate Gitea SSH port** - Currently conflicts with Unraid SSH (both port 22)
2. ⚠️ **Implement VLANs** - Segment management/services/workstations
3. ⚠️ **Firewall hardening** - Move from ACCEPT-all to explicit rules
### Performance
1. Monitor mesh performance between nodes
2. Document ISP speeds and plan accordingly
3. Consider 10GbE upgrade path (future)
### Documentation
1. ✅ Document Code-Server VM configuration
2. ✅ Record TP-Link mesh model and capabilities
3. ✅ Map exact ISP speeds and plan
---
**Last Updated:** October 31, 2025
**Next Review:** When network topology changes
**Quick Access:** See README.md for service URLs

954
docs/quick-start.md Normal file
View File

@@ -0,0 +1,954 @@
# 🚀 Quick Start & Emergency Recovery Guide
**Purpose:** Get your homelab back online quickly after disaster
**Target Time:** 30-60 minutes to basic functionality
**Last Updated:** October 31, 2025
---
## 🎯 Quick Access Reference
### Essential URLs
| Service | URL | Default Credentials |
|---------|-----|---------------------|
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
### SSH Access
```bash
# Local network
ssh root@192.168.68.51
# Via Tailscale (from anywhere)
ssh root@100.122.220.126
# Emergency: Use PiKVM for console access
# https://192.168.68.53
```
---
## 🆘 Emergency Recovery Scenarios
### Scenario 1: Server Won't Boot 🚨
**Symptoms:**
- No network connectivity to 192.168.68.51
- Unraid WebUI unreachable
- No response to ping
**Recovery Steps:**
1. **Physical Check** (via PiKVM or in person)
```
[ ] Server has power (check LED)
[ ] Network cable connected to eth0
[ ] Monitor shows output (via PiKVM)
[ ] USB boot drive is present and detected
```
2. **Use PiKVM for Remote Console**
- Access: https://192.168.68.53
- Login: admin / admin
- View boot process
- Check BIOS/boot messages
3. **Common Boot Issues**
**USB Boot Drive Failure** (Most common!)
```
Symptoms: "Boot device not found" or similar
Fix:
1. Have backup USB ready
2. Shut down server (via PiKVM power control)
3. Replace USB boot drive
4. Power on
5. Restore configuration from backup
```
**BIOS Settings Changed**
```
Fix:
1. Enter BIOS (DEL/F2 during boot)
2. Load defaults
3. Verify boot order (USB first)
4. Save and exit
```
**Hardware Failure**
```
Check:
1. RAM seated properly
2. All drives detected in BIOS
3. CPU fan spinning
4. No error beeps
```
4. **Boot from Backup USB**
```
Steps:
1. Power off server
2. Insert backup USB boot drive
3. Power on
4. Verify boot successful
5. Restore configuration:
- Tools → Flash Backup → Browse → Select backup ZIP
- Reboot
```
**Prevention:**
- ✅ Keep USB flash backup updated (weekly)
- ✅ Store backup USB in safe location
- ✅ Document BIOS settings (screenshots via PiKVM)
---
### Scenario 2: Lost Admin Password
**Unraid Root Password Reset:**
1. **Via PiKVM Console**
```
1. Access PiKVM: https://192.168.68.53
2. View console in browser
3. Wait for login prompt
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
5. At terminal: passwd root
6. Enter new password twice
7. Press Ctrl+Alt+F1 to return to GUI
8. Update documentation
```
2. **Via Physical Access**
```
1. Connect monitor and keyboard to server
2. Press Ctrl+Alt+F2
3. Run: passwd root
4. Set new password
5. Press Ctrl+Alt+F1
```
**Container Passwords:**
- Check `/mnt/user/appdata/<service>/config`
- Review environment variables in Docker templates
- Use Vaultwarden if accessible
- Check this documentation repo in Gitea
---
### Scenario 3: Container Won't Start
**Quick Diagnosis:**
```bash
# Check container status
docker ps -a | grep <container_name>
# View recent logs
docker logs --tail 100 <container_name>
# Look for errors
docker inspect <container_name> | grep -i error
```
**Common Fixes:**
**Port Conflict:**
```bash
# Find what's using the port
netstat -tulpn | grep <port>
# Example: Port 3000 already in use
netstat -tulpn | grep 3000
# Stop conflicting service
docker stop <conflicting_container>
```
**Volume Permission Issues:**
```bash
# Check ownership
ls -la /mnt/user/appdata/<container_name>
# Fix permissions (Unraid standard: 99:100)
chown -R 99:100 /mnt/user/appdata/<container_name>
# Example: Fix Vaultwarden
chown -R 99:100 /mnt/user/appdata/vaultwarden
```
**Dependency Missing:**
```bash
# Example: Guacamole needs MariaDB
docker start mariadb
sleep 10 # Wait for database initialization
docker start ApacheGuacamole
# Verify dependency is running
docker ps | grep mariadb
```
**Resource Exhaustion:**
```bash
# Check cache usage
df -h /mnt/cache
# If cache full (>90%), clean up
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
# Or free space manually
# See service-inventory.md for cleanup recommendations
```
---
### Scenario 4: Network Connectivity Issues
**Can't Access from LAN:**
```bash
# SSH into Unraid (via PiKVM if network down)
ssh root@192.168.68.51
# Check if br0 is up
ip addr show br0
# Should show: 192.168.68.51/22
# Verify IP and routes
ip route | grep default
# Should show: default via 192.168.68.1
# Test router connectivity
ping -c 3 192.168.68.1
# Test internet
ping -c 3 8.8.8.8
# Test DNS (Pi-hole)
nslookup google.com 192.168.68.61
```
**Fix Network Issues:**
```bash
# Restart networking (from console/PiKVM)
/etc/rc.d/rc.inet1 restart
# If that doesn't work, reboot
reboot
```
**Can't Access Containers:**
```bash
# Check Docker network
docker network inspect bridge
# Verify container IP
docker inspect <container_name> | grep IPAddress
# Test from Unraid host
curl http://172.17.0.5:8080 # Example: open-webui
# Test port mapping
curl http://192.168.68.51:3000 # Should reach open-webui
```
**DNS Not Resolving:**
```bash
# Test Pi-hole directly
nslookup google.com 192.168.68.61
# If Pi-hole down, check Pi Zero
ping 192.168.68.61
# SSH to Pi-hole
ssh pi@192.168.68.61
# Check Pi-hole status
pihole status
# Restart if needed
pihole restartdns
```
---
### Scenario 5: Array Won't Start
**Symptoms:**
- Unraid GUI accessible but array shows "Stopped"
- Disks show errors or missing
**Troubleshooting:**
```bash
# Check disk health
smartctl -a /dev/sdb # Parity
smartctl -a /dev/sdc # Disk 1
# View disk assignments
cat /boot/config/disk.cfg
# Check for filesystem errors (read-only check)
xfs_repair -n /dev/md1p1
```
**Common Causes:**
- Parity sync in progress (wait for completion)
- Disk failed (check SMART, may need replacement)
- Unclean shutdown (filesystem check required)
- Disk assignment changed
**Recovery:**
1. **Start Array in Maintenance Mode**
- Click "Start" in Unraid GUI
- Select "Maintenance mode" if prompted
- Run filesystem check if prompted
2. **Review Logs**
- Settings → System Log
- Look for disk errors
- Check for power events
3. **If Disk Failed**
- Follow Unraid disk replacement procedure
- Do NOT format or write to disk unnecessarily
- Seek help in Unraid forums if uncertain
---
## 🔧 Critical Service Restart Procedures
### Restart Core Services (Proper Order)
**1. Infrastructure First:**
```bash
# Start reverse proxy (for routing)
docker start NginxProxyManager
# Wait for it to be ready
sleep 5
docker ps | grep NginxProxyManager
# Start tunnel (for remote access)
docker start Cloudflared
# Verify both running
docker ps | grep -E "NginxProxyManager|Cloudflared"
```
**2. Security Services:**
```bash
# Password manager (critical!)
docker start vaultwarden
# Wait for healthy status
sleep 10
docker ps | grep vaultwarden
# Should show "(healthy)"
# If not healthy, check logs
docker logs --tail 50 vaultwarden
```
**3. Development Tools:**
```bash
# Git server
docker start Gitea
# Wait for initialization
sleep 5
# Remote access gateway
docker start ApacheGuacamole
# Note: Needs MariaDB if configured
```
**4. Monitoring (IMPORTANT!):**
```bash
# Database first
docker start Influxdb
# Wait for DB to initialize
sleep 15
# Then metrics collector
docker start Telegraf
# Finally visualization
docker start Grafana
# Verify all running
docker ps | grep -E "Influxdb|Telegraf|Grafana"
```
**5. Optional Services:**
```bash
# LLM backend
docker start ollama
sleep 10
# LLM interface
docker start open-webui
# Wait for healthy
docker ps | grep open-webui
```
---
### Stop All Services Gracefully
```bash
# Stop all running containers
docker stop $(docker ps -q)
# Verify all stopped
docker ps
# Should show empty output
# Wait before stopping array
sleep 5
# Stop array (from GUI)
# Main → Array Operation → Stop
```
---
## 📦 Backup & Restore Procedures
### USB Flash Backup (Unraid Configuration)
**Create Backup:**
1. Navigate to: **Main → Flash → Flash Backup**
2. Click "Backup Now"
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
4. Store securely OFF-SERVER:
- OneDrive: `/z_Unraid/Backups/`
- External drive
- Cloud storage
**Restore from Backup:**
```
1. Format new USB drive (if needed)
2. Copy backup ZIP to new USB
3. Extract contents to root of USB
- config/ directory
- bzimage, bzroot, etc.
4. Safely eject USB
5. Boot from new USB
6. Configuration restored automatically
```
**Frequency:**
- Weekly minimum
- After ANY configuration change
- Before major updates
---
### Container Data Backup
**Critical Directories:**
```
Priority 1 (CRITICAL):
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
Priority 2 (Important):
/mnt/user/appdata/NginxProxyManager/ Proxy configs
/mnt/user/appdata/Grafana/ Dashboards
/mnt/user/appdata/Influxdb/ Metrics history
Priority 3 (Optional):
/mnt/user/appdata/open-webui/ LLM chat history
```
**Quick Backup Script:**
```bash
#!/bin/bash
# Save as: /mnt/user/scripts/backup-critical.sh
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
echo "Stopping containers..."
docker stop vaultwarden Gitea NginxProxyManager
echo "Backing up data..."
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
echo "Restarting containers..."
docker start vaultwarden Gitea NginxProxyManager
echo "✅ Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"
```
**Make Executable:**
```bash
chmod +x /mnt/user/scripts/backup-critical.sh
```
**Run Manually:**
```bash
/mnt/user/scripts/backup-critical.sh
```
**Schedule (User Scripts Plugin):**
- Frequency: Daily at 2 AM
- Retention: Keep last 30 days
---
**Restore from Backup:**
```bash
# Example: Restore Vaultwarden
docker stop vaultwarden
# Backup current (corrupted) data
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
# Extract backup
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
# Restart container
docker start vaultwarden
# Verify working
curl http://192.168.68.51:4743
```
---
## ⚡ Quick Commands Reference
### System Status
```bash
# System uptime and load
uptime
# Resource usage
free -h
df -h
# Array status
cat /proc/mdcmd
# Docker container summary
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Temperature (if sensors installed)
sensors
# Disk health quick check
smartctl -H /dev/sdb # Parity
smartctl -H /dev/sdc # Disk 1
```
### Docker Quick Commands
```bash
# Start all stopped containers
docker start $(docker ps -aq)
# Stop all running containers
docker stop $(docker ps -q)
# View logs (last 50 lines)
docker logs --tail 50 <container_name>
# Follow logs in real-time
docker logs -f <container_name>
# Restart container
docker restart <container_name>
# Remove container (⚠️ will lose non-volume data!)
docker rm <container_name>
# Clean up unused resources
docker system prune # Safe cleanup
docker system prune -a # ⚠️ Removes unused images too!
docker system prune --volumes # ⚠️ Removes unused volumes!
```
### Network Diagnostics
```bash
# Check all interfaces
ip addr show
# Test key infrastructure
ping -c 3 192.168.68.1 # Router
ping -c 3 192.168.68.51 # Unraid
ping -c 3 192.168.68.61 # Pi-hole
ping -c 3 8.8.8.8 # Internet
# DNS resolution test
nslookup google.com
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
# Check listening ports
netstat -tulpn | grep LISTEN
# Test specific port
nc -zv 192.168.68.51 3002 # Example: Gitea
curl -I http://192.168.68.51:3002 # HTTP test
```
### Quick Health Check Script
```bash
#!/bin/bash
# Save as: /mnt/user/scripts/health-check.sh
echo "=== Unraid Health Check ==="
echo ""
echo "1. Array Status:"
cat /proc/mdcmd | grep mdState
echo ""
echo "2. Running Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"
echo ""
echo "3. Disk Usage:"
df -h | grep -E "cache|disk1|Filesystem"
echo ""
echo "4. Network Connectivity:"
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
echo ""
echo "5. Critical Services:"
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
echo ""
echo "=== Health Check Complete ==="
```
**Run:** `bash /mnt/user/scripts/health-check.sh`
---
## 📞 Getting Help
### Pre-flight Checks
Before asking for help, gather this information:
1. **System Diagnostics**
- Unraid WebGUI: Tools → Diagnostics → Download
- Creates ZIP with all logs
2. **Container Logs**
```bash
docker logs <container_name> > container-logs.txt
```
3. **Network Configuration**
```bash
ip addr show > network-config.txt
ip route show >> network-config.txt
```
4. **Disk Status**
```bash
smartctl -a /dev/sdb > disk-smart.txt
smartctl -a /dev/sdc >> disk-smart.txt
```
### Community Resources
- **Unraid Forums:** https://forums.unraid.net/
- Post diagnostics ZIP
- Be specific about symptoms
- Include what you've tried
- **r/unraid:** https://reddit.com/r/unraid
- Quick questions
- Share diagnostics in pastebin
- **Discord:** Unraid Official Discord
- Real-time help
- Active community
### Emergency Contacts
```
ISP Support: [Your ISP Phone Number]
Unraid License: [Store in secure location]
USB Backup Location: [Document where stored]
Off-site Backup: [If applicable]
```
---
## 🎓 Post-Recovery Checklist
After restoring from disaster:
```
[ ] Unraid array started successfully
[ ] All critical services running
[ ] NginxProxyManager
[ ] Cloudflared
[ ] Vaultwarden
[ ] Gitea
[ ] Network connectivity verified
[ ] Can access Unraid WebUI
[ ] Can ping router (192.168.68.1)
[ ] Internet working
[ ] DNS resolving (Pi-hole)
[ ] Vaultwarden accessible (test password retrieval)
[ ] Gitea accessible (verify repositories intact)
[ ] NPM routing working (test reverse proxy)
[ ] Monitoring stack restarted
[ ] Grafana
[ ] InfluxDB
[ ] Telegraf
[ ] External access working
[ ] Tailscale connected
[ ] Cloudflare tunnel active
[ ] Backups verified and up-to-date
[ ] Documentation updated with lessons learned
[ ] Incident documented in change log (Gitea)
```
---
## 🔒 Security After Recovery
**Immediately After Disaster Recovery:**
1. **Change Passwords** (if compromise suspected)
```
[ ] Unraid root password
[ ] Vaultwarden master password
[ ] Container admin passwords
[ ] Pi-hole admin password
[ ] PiKVM password
```
2. **Review Access Logs**
```bash
# Check SSH attempts
grep "Failed password" /var/log/auth.log | tail -50
# Check NPM access
docker logs NginxProxyManager | grep -i error
# Check Gitea access
docker logs Gitea | grep -i login
```
3. **Verify Firewall Rules**
```bash
iptables -L -n -v
```
4. **Check for Unauthorized Changes**
```bash
# Review Docker containers
docker ps -a
# Check cron jobs
crontab -l
# Review network interfaces
ip addr show
```
---
## 📝 Documentation Updates After Incident
**What to Document:**
1. **What Happened:**
- Date/time of incident
- Symptoms observed
- Root cause (if determined)
- Duration of outage
2. **What You Did:**
- Steps taken to recover
- What worked / didn't work
- Resources used (forums, docs, etc.)
- Time to recovery
3. **Lessons Learned:**
- What could prevent this in future
- Process improvements needed
- Documentation gaps discovered
- Backup improvements needed
4. **Action Items:**
- Backups to implement/improve
- Monitoring to add
- Scripts to create
- Hardware to replace/upgrade
**Where to Document:**
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
- Update this quick-start guide with new procedures
- Add to troubleshooting section if recurring issue
- Commit to Gitea with detailed message
---
## 🚀 Normal Startup Sequence
**From Cold Boot:**
```
1. Power on server
2. BIOS POST (~30 seconds)
- Hardware check
- Memory test
- Drive detection
3. Unraid boots from USB (~1-2 minutes)
- Linux kernel loads
- Unraid OS starts
4. Network initializes
- br0 interface up
- Gets IP: 192.168.68.51
5. Array auto-starts (if configured)
- Parity disk: sdb
- Data disk: sdc
- Cache: nvme1n1p1
6. Docker service starts
- docker0 bridge created
- Networks initialized
7. Containers auto-start (if enabled)
- Infrastructure services first
- Then application services
8. Services available (~3-5 minutes total)
✅ Ready to use!
```
**Expected Boot Time:** 3-5 minutes
**If Taking Longer:** Check system log for errors
---
## 🎯 Quick Health Check Command
**Run After Any Restart:**
```bash
# Quick one-liner health check
docker ps --format "table {{.Names}}\t{{.Status}}" && \
df -h | grep -E "cache|disk1" && \
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
```
---
## 📚 Related Documentation
- **Network Issues:** See `network-map.md`
- **Service Details:** See `service-inventory.md`
- **Container Configs:** See `docker-compose/` (when created)
- **Main Overview:** See `README.md`
---
## 🆘 True Emergency - Complete System Down
**If everything is down and you need immediate help:**
1. **Access via PiKVM**
- https://192.168.68.53
- Get console access
- View what's happening
2. **Check Physical Server**
- Power LED on?
- Fans spinning?
- Drives spinning up?
- Network activity lights?
3. **Try Safe Mode Boot**
- Boot Unraid in Safe Mode (GUI mode)
- Diagnose from console
4. **Community Help**
- Unraid Discord (fastest response)
- Forums with diagnostics ZIP
- r/unraid for quick questions
5. **Document Everything**
- Take photos/screenshots via PiKVM
- Note exact error messages
- Record what you tried
- Timeline of events
---
## 💡 Pro Tips
1. **Test Your Backups**
- Restore test annually
- Verify data integrity
- Practice recovery procedures
2. **Keep This Guide Accessible**
- Save offline copy to phone/laptop
- Print critical sections
- Bookmark in browser
3. **Automate Where Possible**
- Schedule backup scripts
- Set up monitoring alerts
- Use User Scripts plugin
4. **Document As You Go**
- Update after fixing issues
- Add new procedures discovered
- Note what worked/didn't work
---
**Last Updated:** October 31, 2025
**Next Review:** Quarterly or after incidents
**Maintained By:** Weston
---
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
**Keep this guide accessible even when the server is down!**
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
🚀 **You've got this!**

614
docs/service-inventory.md Normal file
View File

@@ -0,0 +1,614 @@
# 📦 Service Inventory - Complete Container Catalog
**Last Updated:** October 31, 2025
**Total Containers:** 32 (6 running, 26 stopped)
**Purpose:** Comprehensive catalog of all services
---
## 📊 Quick Stats
| Metric | Value | Status |
|--------|-------|--------|
| **Total Containers** | 32 | - |
| **Running** | 6 | ✅ 19% |
| **Stopped** | 26 | ⚠️ 81% |
| **Total Docker Images** | ~50GB | ⚠️ High |
| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
**Key Insight:** 81% of containers are stopped - cleanup opportunity!
---
## 🟢 Running Services (6 containers)
### 1. open-webui ⭐⭐⭐
**Status:** Running (healthy)
**Container:** open-webui
**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)
**Created:** 2025-10-16 (2 weeks ago)
**Network:** bridge (172.17.0.5)
**Ports:** 8080 → 3000
**Resources:**
- CPU: 0.15%
- Memory: 1.026GB / 60.55GB (1.69%)
- Storage: 42.4MB
**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
**Dependencies:**
- ollama (currently STOPPED ❌)
- OpenAI API key (configured)
**Access:**
- Local: http://192.168.68.51:3000
- No authentication by default
**Issues:**
- ⚠️ Depends on ollama container which is stopped
- ⚠️ OpenAI API key exposed in environment variables
**Recommendations:**
1.**KEEP** - Active LLM interface
2. Restart ollama container to enable local models
3. Move API keys to Docker secrets
4. Enable authentication
**Priority:** HIGH - Core AI/ML service
---
### 2. NginxProxyManager ⭐⭐⭐
**Status:** Running
**Container:** NginxProxyManager
**Image:** jlesage/nginx-proxy-manager (189MB)
**Created:** 2025-10-11 (3 weeks ago)
**Network:** bridge (172.17.0.4)
**Ports:** 4443→18443, 8080→1880, 8181→7818
**Resources:**
- CPU: 0.08%
- Memory: 77.45MB (0.12%)
- Storage: 13.4KB
**Purpose:** Reverse proxy with web UI - SSL termination and routing
**Dependencies:** None
**Access:**
- Admin UI: http://192.168.68.51:7818
- HTTP: http://192.168.68.51:1880
- HTTPS: https://192.168.68.51:18443
**Configuration:**
- Routes traffic to backend services
- Manages SSL certificates
- Provides access control
**Recommendations:**
1.**KEEP** - Critical infrastructure
2. Document all proxy rules in Gitea
3. Verify SSL auto-renewal is configured
4. Enable MFA if available
5. Review access logs regularly
**Priority:** CRITICAL - Core infrastructure
---
### 3. Gitea ⭐⭐⭐
**Status:** Running
**Container:** Gitea
**Image:** gitea/gitea (180MB)
**Created:** 2025-10-08 (3 weeks ago)
**Network:** bridge (172.17.0.3)
**Ports:** 22→22, 3000→3002
**Resources:**
- CPU: 0.11%
- Memory: 114.5MB (0.18%)
- Storage: 113MB (active repositories!)
**Purpose:** Self-hosted Git server (GitHub alternative)
**Dependencies:** None (internal SQLite)
**Access:**
- Web: http://192.168.68.51:3002
- Domain: https://gitea.segelschiff.app
- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
**Configuration:**
- Using latest tag (unpinned version)
- Storage: /mnt/user/appdata/gitea
**Issues:**
- ⚠️ SSH port 22 conflicts with Unraid SSH
- ⚠️ Using `latest` tag (version not pinned)
- ⚠️ Backup strategy unknown
**Recommendations:**
1.**KEEP** - Critical for version control
2. Change SSH port to 2222 to avoid conflict
3. Pin to specific version tag
4. Implement automated backups (CRITICAL!)
5. This is your version control hub - protect it!
**Priority:** CRITICAL - Infrastructure documentation depends on this
---
### 4. ApacheGuacamole ⭐⭐
**Status:** Running (2+ months uptime!)
**Container:** ApacheGuacamole
**Image:** jasonbean/guacamole (737MB)
**Created:** 2025-08-22 (2+ months ago)
**Network:** bridge (172.17.0.2)
**Ports:** 8080→4000
**Resources:**
- CPU: 0.16%
- Memory: 785.8MB (1.27%)
- Storage: 46.2MB
**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
**Dependencies:**
- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
**Access:**
- Web: http://192.168.68.51:4000
**Configuration:**
- MySQL enabled but MariaDB stopped
- Multiple auth modules: MySQL, LDAP, TOTP, etc.
**Issues:**
- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
- Currently using embedded database (not recommended)
- Data loss risk without proper database backend
**Recommendations:**
1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
2. If keeping: Start MariaDB and verify connection
3. If not using: Stop Guacamole and remove both
4. Document your use case for remote desktop access
**Priority:** MEDIUM - Fix dependency or remove
---
### 5. Cloudflared ⭐⭐⭐
**Status:** Running (2.5+ months - very stable!)
**Container:** Unraid-Cloudflared-Tunnel
**Image:** figro/unraid-cloudflared-tunnel (8.92MB)
**Created:** 2025-08-10 (2.5+ months ago)
**Network:** bridge (172.17.0.6)
**Ports:** 46495→46495 (metrics)
**Resources:**
- CPU: 0.33% (highest of running containers)
- Memory: 68.6MB (0.11%)
- Network I/O: 41.7MB RX / 310KB TX
**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
**Dependencies:** None
**Access:**
- Metrics: http://192.168.68.51:46495
- Domain: *.segelschiff.app (managed via Cloudflare)
**Configuration:**
- Tunnel token configured
- No auto-update enabled
- Metrics exposed for monitoring
**Security:**
- ⚠️ Tunnel token in plain text environment variable
- ✅ No open ports on router (excellent!)
**Recommendations:**
1.**KEEP** - Excellent security practice
2. Rotate tunnel token periodically
3. Document which services are exposed
4. Integrate metrics with monitoring stack
**Priority:** HIGH - Critical for secure remote access
---
### 6. Vaultwarden ⭐⭐⭐
**Status:** Running (healthy) - 3+ months uptime!
**Container:** vaultwarden
**Image:** vaultwarden/server (256MB)
**Created:** 2025-07-31 (3+ months ago)
**Network:** bridge (172.17.0.7)
**Ports:** 80→4743
**Resources:**
- CPU: 0.00% (idle)
- Memory: 24.96MB (0.04%) - Very lightweight!
**Purpose:** Self-hosted password manager (Bitwarden compatible)
**Dependencies:** None
**Access:**
- Web: http://192.168.68.51:4743
- Admin: http://192.168.68.51:4743/admin
**Configuration:**
- Signups allowed: true ⚠️
- Invitations allowed: false ✅
- WebSocket disabled ⚠️
- Admin token exposed ⚠️
**Issues:**
- 🚨 **CRITICAL:** No backup strategy evident!
- ⚠️ Admin token in plain text
- ⚠️ Signups open (verify intentional)
- ⚠️ WebSocket disabled (reduces functionality)
**Recommendations:**
1.**KEEP** - Critical security infrastructure
2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
3. Close signups after initial setup
4. Rotate admin token and use secrets management
5. Enable WebSocket for better sync
6. Automate daily backups to off-site location
**Priority:** CRITICAL - Contains all your passwords!
---
## 🔴 Recently Stopped Services (Worth Investigating)
### 7. ollama ⚠️
**Status:** Exited (128) 4 minutes ago
**Image:** ollama/ollama (3.33GB)
**Purpose:** Local LLM inference engine
**Why It Matters:** open-webui depends on this!
**Recommendations:**
1. 🔧 **RESTART** - Required for open-webui local models
2. Investigate exit code 128 (configuration issue?)
3. Configure GPU acceleration (RTX 4090!)
4. Test with open-webui after restart
**Action:** `docker start ollama && docker logs -f ollama`
---
### 8. Monitoring Stack (Stopped 12 days ago) 🚨
**Containers:**
- Grafana (stopped 12 days)
- InfluxDB (stopped 12 days)
- Telegraf (stopped 12 days)
**Total Size:** ~1.7GB
**Why Critical:** Zero observability into system health!
**Recommendations:**
1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
2. Configure dashboards for:
- Docker container stats
- System resources (CPU, RAM, disk)
- Network traffic
- Temperature sensors
3. Set up alerting for critical issues
4. Document in runbook
**Action:**
```bash
docker start Influxdb
sleep 15 # Wait for DB initialization
docker start Telegraf
docker start Grafana
```
---
### 9. MariaDB (Stopped 12 days ago) ⚠️
**Status:** Exited (0) 12 days ago
**Image:** lscr.io/linuxserver/mariadb (348MB)
**Purpose:** MySQL database for Guacamole
**Issue:** Guacamole is running but database is stopped!
**Recommendations:**
1. If using Guacamole: **RESTART**
2. If not using Guacamole: **REMOVE BOTH**
3. Document decision
---
### 10. Database Admin Tools (Stopped 12 days ago)
**CloudBeaver** - Stopped 12 days
**adminer** - Stopped 12 days
**Issue:** Two database admin tools - redundant!
**Recommendations:**
1. **CHOOSE ONE:**
- CloudBeaver: Feature-rich (725MB)
- adminer: Lightweight (118MB)
2. Remove the other
3. Only restart if you need database management
---
## 🟡 Experimental / Inactive Services (Decision Needed)
### 11. Nextcloud AIO Stack (7 containers!) 🚨
**Status:** All stopped 3 weeks ago
**Total Size:** ~7GB Docker images + data
**Containers:**
- nextcloud-aio-mastercontainer
- nextcloud-aio-apache
- nextcloud-aio-nextcloud (2.19GB)
- nextcloud-aio-database (PostgreSQL)
- nextcloud-aio-redis
- nextcloud-aio-onlyoffice (3.79GB!)
- nextcloud-aio-imaginary
- nextcloud-aio-notify-push
**Data:** /mnt/user/nextcloud (~1GB+)
**Analysis:**
- Massive resource footprint
- "All-in-One" = heavy coupling
- Stopped for 3 weeks suggests not critical
**Recommendations:**
**DECISION REQUIRED:**
**Option A: Remove Everything**
```bash
# Backup data first!
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
# Remove containers
docker rm nextcloud-aio-*
# Remove images to free space
docker rmi $(docker images | grep nextcloud | awk '{print $3}')
# Archive data
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
```
**Saves:** ~7GB+ space
**Option B: Keep and Restart**
- Document why you need it
- Create restart procedure
- Implement backup strategy
- Monitor resource usage
**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
---
### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
**Status:** Exited (0) 2 weeks ago
**Image:** jellyfin/jellyfin (1.25GB)
**GPU:** RTX 4090 allocated but idle!
**Media:**
- Movies: /mnt/user/movies
- TV: /mnt/user/tv shows
- Music: /mnt/user/music
**Issue:** $1600 GPU sitting idle!
**Recommendations:**
**If you want media server:**
1. **RESTART** with hardware transcoding:
```bash
docker start Jellyfin
```
2. Configure NVENC/NVDEC for RTX 4090
3. Test 4K transcoding performance
4. Switch from `host` network to bridge (security)
**If you don't need media server:**
1. Remove GPU allocation from container
2. Free GPU for other projects (AI/ML)
**Action Required:** Decide on media server strategy
---
### 13. Large AI/ML Containers (Rarely Used)
**ebook2audiobook** - 20.06GB! (stopped 3 weeks)
**docling-serve** - 14.45GB! (stopped 2 weeks)
**Total:** 34.5GB for two containers!
**Analysis:**
- Massive images
- Rarely used (stopped weeks ago)
- Experimental/one-time use?
**Recommendations:**
1. **REMOVE** both to free 34.5GB
2. If needed again, pull fresh images
3. Document use cases if keeping
**Potential Savings:** 34.5GB cache space!
---
### 14. Productivity Suite (Multiple Stopped)
**baserow** - Stopped 2 weeks (2.25GB)
**NocoDB** - Stopped 3 weeks (588MB)
**OpenProject** - Stopped 7 weeks (2.87GB)
**Issue:** Three project management tools - redundant!
**Recommendations:**
1. **CHOOSE ONE** (or none if not used)
2. Remove the others
3. Migrate data if needed first
**Potential Savings:** ~5GB
---
### 15. Development Tools
**n8n** (workflow automation) - Created but never started
**steam-headless** - Created but not running
**Recommendations:**
- Document if you have plans for these
- Remove if experimental and abandoned
---
## 📋 Container Decision Matrix
| Container | Keep? | Action | Priority |
|-----------|-------|--------|----------|
| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
| **ollama** | ✅ Yes | Restart immediately | HIGH |
| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
---
## 🎯 Recommended Action Plan
### Phase 1: Critical (Do First!) 🚨
1. **Backup Vaultwarden** (30 min)
```bash
docker stop vaultwarden
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
docker start vaultwarden
```
2. **Backup Gitea** (30 min)
```bash
docker stop Gitea
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
docker start Gitea
```
3. **Restart Monitoring Stack** (15 min)
```bash
docker start Influxdb && sleep 15
docker start Telegraf Grafana
# Configure dashboards
```
4. **Restart ollama** (5 min)
```bash
docker start ollama
docker logs -f ollama
```
### Phase 2: Cleanup (Free Space!) 💾
5. **Remove Large Unused Containers** (1 hour)
- ebook2audiobook (20GB)
- docling-serve (14.5GB)
- Nextcloud AIO stack (7GB)
- **Saves: ~41GB!**
6. **Docker System Cleanup**
```bash
docker system prune -a
# Free unused images and build cache
```
### Phase 3: Decisions (This Week)
7. **Guacamole + MariaDB** - Keep or remove?
8. **Jellyfin** - Restart with GPU or remove?
9. **Productivity tools** - Choose one, remove others
10. **Database admin** - CloudBeaver or adminer?
---
## 📊 Storage Cleanup Impact
**Current Cache Usage:** 578GB / 932GB (63%)
**After Recommended Cleanup:**
- Remove ebook2audiobook: -20GB
- Remove docling-serve: -14.5GB
- Remove Nextcloud AIO: -7GB
- Docker system prune: ~10-20GB
- **Total Freed: ~50-60GB**
**New Cache Usage:** ~520GB / 932GB (56%) ✅
---
## 🔐 Security Recommendations
1. **Secrets Management** - Stop using plain text env vars
2. **Close Open Signups** - Vaultwarden signups should be closed
3. **SSH Port Conflict** - Fix Gitea port 22 conflict
4. **Network Mode** - Move Jellyfin from `host` to `bridge`
5. **Version Pinning** - Stop using `latest` tags
---
## 📈 Resource Summary
**Docker Images Total:** ~50GB
**Container Data:** Varies by appdata
**Cache Impact:** High (63% full)
**Top Resource Consumers (Images):**
1. ebook2audiobook: 20.06GB
2. docling-serve: 14.45GB
3. Nextcloud stack: ~7GB
4. open-webui: 4.55GB
5. OpenProject: 2.87GB
---
## 🎓 Key Takeaways
1. **6 services are your core** - Keep these running
2. **26 stopped containers** - Cleanup opportunity
3. **~40GB can be freed** - Significant space available
4. **No monitoring** - Critical gap (restart Grafana stack!)
5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
---
**Last Updated:** October 31, 2025
**Next Review:** After cleanup actions completed
**Maintained By:** Weston

954
quick-start.md Normal file
View File

@@ -0,0 +1,954 @@
# 🚀 Quick Start & Emergency Recovery Guide
**Purpose:** Get your homelab back online quickly after disaster
**Target Time:** 30-60 minutes to basic functionality
**Last Updated:** October 31, 2025
---
## 🎯 Quick Access Reference
### Essential URLs
| Service | URL | Default Credentials |
|---------|-----|---------------------|
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
### SSH Access
```bash
# Local network
ssh root@192.168.68.51
# Via Tailscale (from anywhere)
ssh root@100.122.220.126
# Emergency: Use PiKVM for console access
# https://192.168.68.53
```
---
## 🆘 Emergency Recovery Scenarios
### Scenario 1: Server Won't Boot 🚨
**Symptoms:**
- No network connectivity to 192.168.68.51
- Unraid WebUI unreachable
- No response to ping
**Recovery Steps:**
1. **Physical Check** (via PiKVM or in person)
```
[ ] Server has power (check LED)
[ ] Network cable connected to eth0
[ ] Monitor shows output (via PiKVM)
[ ] USB boot drive is present and detected
```
2. **Use PiKVM for Remote Console**
- Access: https://192.168.68.53
- Login: admin / admin
- View boot process
- Check BIOS/boot messages
3. **Common Boot Issues**
**USB Boot Drive Failure** (Most common!)
```
Symptoms: "Boot device not found" or similar
Fix:
1. Have backup USB ready
2. Shut down server (via PiKVM power control)
3. Replace USB boot drive
4. Power on
5. Restore configuration from backup
```
**BIOS Settings Changed**
```
Fix:
1. Enter BIOS (DEL/F2 during boot)
2. Load defaults
3. Verify boot order (USB first)
4. Save and exit
```
**Hardware Failure**
```
Check:
1. RAM seated properly
2. All drives detected in BIOS
3. CPU fan spinning
4. No error beeps
```
4. **Boot from Backup USB**
```
Steps:
1. Power off server
2. Insert backup USB boot drive
3. Power on
4. Verify boot successful
5. Restore configuration:
- Tools → Flash Backup → Browse → Select backup ZIP
- Reboot
```
**Prevention:**
- ✅ Keep USB flash backup updated (weekly)
- ✅ Store backup USB in safe location
- ✅ Document BIOS settings (screenshots via PiKVM)
---
### Scenario 2: Lost Admin Password
**Unraid Root Password Reset:**
1. **Via PiKVM Console**
```
1. Access PiKVM: https://192.168.68.53
2. View console in browser
3. Wait for login prompt
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
5. At terminal: passwd root
6. Enter new password twice
7. Press Ctrl+Alt+F1 to return to GUI
8. Update documentation
```
2. **Via Physical Access**
```
1. Connect monitor and keyboard to server
2. Press Ctrl+Alt+F2
3. Run: passwd root
4. Set new password
5. Press Ctrl+Alt+F1
```
**Container Passwords:**
- Check `/mnt/user/appdata/<service>/config`
- Review environment variables in Docker templates
- Use Vaultwarden if accessible
- Check this documentation repo in Gitea
---
### Scenario 3: Container Won't Start
**Quick Diagnosis:**
```bash
# Check container status
docker ps -a | grep <container_name>
# View recent logs
docker logs --tail 100 <container_name>
# Look for errors
docker inspect <container_name> | grep -i error
```
**Common Fixes:**
**Port Conflict:**
```bash
# Find what's using the port
netstat -tulpn | grep <port>
# Example: Port 3000 already in use
netstat -tulpn | grep 3000
# Stop conflicting service
docker stop <conflicting_container>
```
**Volume Permission Issues:**
```bash
# Check ownership
ls -la /mnt/user/appdata/<container_name>
# Fix permissions (Unraid standard: 99:100)
chown -R 99:100 /mnt/user/appdata/<container_name>
# Example: Fix Vaultwarden
chown -R 99:100 /mnt/user/appdata/vaultwarden
```
**Dependency Missing:**
```bash
# Example: Guacamole needs MariaDB
docker start mariadb
sleep 10 # Wait for database initialization
docker start ApacheGuacamole
# Verify dependency is running
docker ps | grep mariadb
```
**Resource Exhaustion:**
```bash
# Check cache usage
df -h /mnt/cache
# If cache full (>90%), clean up
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
# Or free space manually
# See service-inventory.md for cleanup recommendations
```
---
### Scenario 4: Network Connectivity Issues
**Can't Access from LAN:**
```bash
# SSH into Unraid (via PiKVM if network down)
ssh root@192.168.68.51
# Check if br0 is up
ip addr show br0
# Should show: 192.168.68.51/22
# Verify IP and routes
ip route | grep default
# Should show: default via 192.168.68.1
# Test router connectivity
ping -c 3 192.168.68.1
# Test internet
ping -c 3 8.8.8.8
# Test DNS (Pi-hole)
nslookup google.com 192.168.68.61
```
**Fix Network Issues:**
```bash
# Restart networking (from console/PiKVM)
/etc/rc.d/rc.inet1 restart
# If that doesn't work, reboot
reboot
```
**Can't Access Containers:**
```bash
# Check Docker network
docker network inspect bridge
# Verify container IP
docker inspect <container_name> | grep IPAddress
# Test from Unraid host
curl http://172.17.0.5:8080 # Example: open-webui
# Test port mapping
curl http://192.168.68.51:3000 # Should reach open-webui
```
**DNS Not Resolving:**
```bash
# Test Pi-hole directly
nslookup google.com 192.168.68.61
# If Pi-hole down, check Pi Zero
ping 192.168.68.61
# SSH to Pi-hole
ssh pi@192.168.68.61
# Check Pi-hole status
pihole status
# Restart if needed
pihole restartdns
```
---
### Scenario 5: Array Won't Start
**Symptoms:**
- Unraid GUI accessible but array shows "Stopped"
- Disks show errors or missing
**Troubleshooting:**
```bash
# Check disk health
smartctl -a /dev/sdb # Parity
smartctl -a /dev/sdc # Disk 1
# View disk assignments
cat /boot/config/disk.cfg
# Check for filesystem errors (read-only check)
xfs_repair -n /dev/md1p1
```
**Common Causes:**
- Parity sync in progress (wait for completion)
- Disk failed (check SMART, may need replacement)
- Unclean shutdown (filesystem check required)
- Disk assignment changed
**Recovery:**
1. **Start Array in Maintenance Mode**
- Click "Start" in Unraid GUI
- Select "Maintenance mode" if prompted
- Run filesystem check if prompted
2. **Review Logs**
- Settings → System Log
- Look for disk errors
- Check for power events
3. **If Disk Failed**
- Follow Unraid disk replacement procedure
- Do NOT format or write to disk unnecessarily
- Seek help in Unraid forums if uncertain
---
## 🔧 Critical Service Restart Procedures
### Restart Core Services (Proper Order)
**1. Infrastructure First:**
```bash
# Start reverse proxy (for routing)
docker start NginxProxyManager
# Wait for it to be ready
sleep 5
docker ps | grep NginxProxyManager
# Start tunnel (for remote access)
docker start Cloudflared
# Verify both running
docker ps | grep -E "NginxProxyManager|Cloudflared"
```
**2. Security Services:**
```bash
# Password manager (critical!)
docker start vaultwarden
# Wait for healthy status
sleep 10
docker ps | grep vaultwarden
# Should show "(healthy)"
# If not healthy, check logs
docker logs --tail 50 vaultwarden
```
**3. Development Tools:**
```bash
# Git server
docker start Gitea
# Wait for initialization
sleep 5
# Remote access gateway
docker start ApacheGuacamole
# Note: Needs MariaDB if configured
```
**4. Monitoring (IMPORTANT!):**
```bash
# Database first
docker start Influxdb
# Wait for DB to initialize
sleep 15
# Then metrics collector
docker start Telegraf
# Finally visualization
docker start Grafana
# Verify all running
docker ps | grep -E "Influxdb|Telegraf|Grafana"
```
**5. Optional Services:**
```bash
# LLM backend
docker start ollama
sleep 10
# LLM interface
docker start open-webui
# Wait for healthy
docker ps | grep open-webui
```
---
### Stop All Services Gracefully
```bash
# Stop all running containers
docker stop $(docker ps -q)
# Verify all stopped
docker ps
# Should show empty output
# Wait before stopping array
sleep 5
# Stop array (from GUI)
# Main → Array Operation → Stop
```
---
## 📦 Backup & Restore Procedures
### USB Flash Backup (Unraid Configuration)
**Create Backup:**
1. Navigate to: **Main → Flash → Flash Backup**
2. Click "Backup Now"
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
4. Store securely OFF-SERVER:
- OneDrive: `/z_Unraid/Backups/`
- External drive
- Cloud storage
**Restore from Backup:**
```
1. Format new USB drive (if needed)
2. Copy backup ZIP to new USB
3. Extract contents to root of USB
- config/ directory
- bzimage, bzroot, etc.
4. Safely eject USB
5. Boot from new USB
6. Configuration restored automatically
```
**Frequency:**
- Weekly minimum
- After ANY configuration change
- Before major updates
---
### Container Data Backup
**Critical Directories:**
```
Priority 1 (CRITICAL):
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
Priority 2 (Important):
/mnt/user/appdata/NginxProxyManager/ Proxy configs
/mnt/user/appdata/Grafana/ Dashboards
/mnt/user/appdata/Influxdb/ Metrics history
Priority 3 (Optional):
/mnt/user/appdata/open-webui/ LLM chat history
```
**Quick Backup Script:**
```bash
#!/bin/bash
# Save as: /mnt/user/scripts/backup-critical.sh
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
echo "Stopping containers..."
docker stop vaultwarden Gitea NginxProxyManager
echo "Backing up data..."
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
echo "Restarting containers..."
docker start vaultwarden Gitea NginxProxyManager
echo "✅ Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"
```
**Make Executable:**
```bash
chmod +x /mnt/user/scripts/backup-critical.sh
```
**Run Manually:**
```bash
/mnt/user/scripts/backup-critical.sh
```
**Schedule (User Scripts Plugin):**
- Frequency: Daily at 2 AM
- Retention: Keep last 30 days
---
**Restore from Backup:**
```bash
# Example: Restore Vaultwarden
docker stop vaultwarden
# Backup current (corrupted) data
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
# Extract backup
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
# Restart container
docker start vaultwarden
# Verify working
curl http://192.168.68.51:4743
```
---
## ⚡ Quick Commands Reference
### System Status
```bash
# System uptime and load
uptime
# Resource usage
free -h
df -h
# Array status
cat /proc/mdcmd
# Docker container summary
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Temperature (if sensors installed)
sensors
# Disk health quick check
smartctl -H /dev/sdb # Parity
smartctl -H /dev/sdc # Disk 1
```
### Docker Quick Commands
```bash
# Start all stopped containers
docker start $(docker ps -aq)
# Stop all running containers
docker stop $(docker ps -q)
# View logs (last 50 lines)
docker logs --tail 50 <container_name>
# Follow logs in real-time
docker logs -f <container_name>
# Restart container
docker restart <container_name>
# Remove container (⚠️ will lose non-volume data!)
docker rm <container_name>
# Clean up unused resources
docker system prune # Safe cleanup
docker system prune -a # ⚠️ Removes unused images too!
docker system prune --volumes # ⚠️ Removes unused volumes!
```
### Network Diagnostics
```bash
# Check all interfaces
ip addr show
# Test key infrastructure
ping -c 3 192.168.68.1 # Router
ping -c 3 192.168.68.51 # Unraid
ping -c 3 192.168.68.61 # Pi-hole
ping -c 3 8.8.8.8 # Internet
# DNS resolution test
nslookup google.com
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
# Check listening ports
netstat -tulpn | grep LISTEN
# Test specific port
nc -zv 192.168.68.51 3002 # Example: Gitea
curl -I http://192.168.68.51:3002 # HTTP test
```
### Quick Health Check Script
```bash
#!/bin/bash
# Save as: /mnt/user/scripts/health-check.sh
echo "=== Unraid Health Check ==="
echo ""
echo "1. Array Status:"
cat /proc/mdcmd | grep mdState
echo ""
echo "2. Running Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"
echo ""
echo "3. Disk Usage:"
df -h | grep -E "cache|disk1|Filesystem"
echo ""
echo "4. Network Connectivity:"
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
echo ""
echo "5. Critical Services:"
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
echo ""
echo "=== Health Check Complete ==="
```
**Run:** `bash /mnt/user/scripts/health-check.sh`
---
## 📞 Getting Help
### Pre-flight Checks
Before asking for help, gather this information:
1. **System Diagnostics**
- Unraid WebGUI: Tools → Diagnostics → Download
- Creates ZIP with all logs
2. **Container Logs**
```bash
docker logs <container_name> > container-logs.txt
```
3. **Network Configuration**
```bash
ip addr show > network-config.txt
ip route show >> network-config.txt
```
4. **Disk Status**
```bash
smartctl -a /dev/sdb > disk-smart.txt
smartctl -a /dev/sdc >> disk-smart.txt
```
### Community Resources
- **Unraid Forums:** https://forums.unraid.net/
- Post diagnostics ZIP
- Be specific about symptoms
- Include what you've tried
- **r/unraid:** https://reddit.com/r/unraid
- Quick questions
- Share diagnostics in pastebin
- **Discord:** Unraid Official Discord
- Real-time help
- Active community
### Emergency Contacts
```
ISP Support: [Your ISP Phone Number]
Unraid License: [Store in secure location]
USB Backup Location: [Document where stored]
Off-site Backup: [If applicable]
```
---
## 🎓 Post-Recovery Checklist
After restoring from disaster:
```
[ ] Unraid array started successfully
[ ] All critical services running
[ ] NginxProxyManager
[ ] Cloudflared
[ ] Vaultwarden
[ ] Gitea
[ ] Network connectivity verified
[ ] Can access Unraid WebUI
[ ] Can ping router (192.168.68.1)
[ ] Internet working
[ ] DNS resolving (Pi-hole)
[ ] Vaultwarden accessible (test password retrieval)
[ ] Gitea accessible (verify repositories intact)
[ ] NPM routing working (test reverse proxy)
[ ] Monitoring stack restarted
[ ] Grafana
[ ] InfluxDB
[ ] Telegraf
[ ] External access working
[ ] Tailscale connected
[ ] Cloudflare tunnel active
[ ] Backups verified and up-to-date
[ ] Documentation updated with lessons learned
[ ] Incident documented in change log (Gitea)
```
---
## 🔒 Security After Recovery
**Immediately After Disaster Recovery:**
1. **Change Passwords** (if compromise suspected)
```
[ ] Unraid root password
[ ] Vaultwarden master password
[ ] Container admin passwords
[ ] Pi-hole admin password
[ ] PiKVM password
```
2. **Review Access Logs**
```bash
# Check SSH attempts
grep "Failed password" /var/log/auth.log | tail -50
# Check NPM access
docker logs NginxProxyManager | grep -i error
# Check Gitea access
docker logs Gitea | grep -i login
```
3. **Verify Firewall Rules**
```bash
iptables -L -n -v
```
4. **Check for Unauthorized Changes**
```bash
# Review Docker containers
docker ps -a
# Check cron jobs
crontab -l
# Review network interfaces
ip addr show
```
---
## 📝 Documentation Updates After Incident
**What to Document:**
1. **What Happened:**
- Date/time of incident
- Symptoms observed
- Root cause (if determined)
- Duration of outage
2. **What You Did:**
- Steps taken to recover
- What worked / didn't work
- Resources used (forums, docs, etc.)
- Time to recovery
3. **Lessons Learned:**
- What could prevent this in future
- Process improvements needed
- Documentation gaps discovered
- Backup improvements needed
4. **Action Items:**
- Backups to implement/improve
- Monitoring to add
- Scripts to create
- Hardware to replace/upgrade
**Where to Document:**
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
- Update this quick-start guide with new procedures
- Add to troubleshooting section if recurring issue
- Commit to Gitea with detailed message
---
## 🚀 Normal Startup Sequence
**From Cold Boot:**
```
1. Power on server
2. BIOS POST (~30 seconds)
- Hardware check
- Memory test
- Drive detection
3. Unraid boots from USB (~1-2 minutes)
- Linux kernel loads
- Unraid OS starts
4. Network initializes
- br0 interface up
- Gets IP: 192.168.68.51
5. Array auto-starts (if configured)
- Parity disk: sdb
- Data disk: sdc
- Cache: nvme1n1p1
6. Docker service starts
- docker0 bridge created
- Networks initialized
7. Containers auto-start (if enabled)
- Infrastructure services first
- Then application services
8. Services available (~3-5 minutes total)
✅ Ready to use!
```
**Expected Boot Time:** 3-5 minutes
**If Taking Longer:** Check system log for errors
---
## 🎯 Quick Health Check Command
**Run After Any Restart:**
```bash
# Quick one-liner health check
docker ps --format "table {{.Names}}\t{{.Status}}" && \
df -h | grep -E "cache|disk1" && \
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
```
---
## 📚 Related Documentation
- **Network Issues:** See `network-map.md`
- **Service Details:** See `service-inventory.md`
- **Container Configs:** See `docker-compose/` (when created)
- **Main Overview:** See `README.md`
---
## 🆘 True Emergency - Complete System Down
**If everything is down and you need immediate help:**
1. **Access via PiKVM**
- https://192.168.68.53
- Get console access
- View what's happening
2. **Check Physical Server**
- Power LED on?
- Fans spinning?
- Drives spinning up?
- Network activity lights?
3. **Try Safe Mode Boot**
- Boot Unraid in Safe Mode (GUI mode)
- Diagnose from console
4. **Community Help**
- Unraid Discord (fastest response)
- Forums with diagnostics ZIP
- r/unraid for quick questions
5. **Document Everything**
- Take photos/screenshots via PiKVM
- Note exact error messages
- Record what you tried
- Timeline of events
---
## 💡 Pro Tips
1. **Test Your Backups**
- Restore test annually
- Verify data integrity
- Practice recovery procedures
2. **Keep This Guide Accessible**
- Save offline copy to phone/laptop
- Print critical sections
- Bookmark in browser
3. **Automate Where Possible**
- Schedule backup scripts
- Set up monitoring alerts
- Use User Scripts plugin
4. **Document As You Go**
- Update after fixing issues
- Add new procedures discovered
- Note what worked/didn't work
---
**Last Updated:** October 31, 2025
**Next Review:** Quarterly or after incidents
**Maintained By:** Weston
---
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
**Keep this guide accessible even when the server is down!**
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
🚀 **You've got this!**

614
service-inventory.md Normal file
View File

@@ -0,0 +1,614 @@
# 📦 Service Inventory - Complete Container Catalog
**Last Updated:** October 31, 2025
**Total Containers:** 32 (6 running, 26 stopped)
**Purpose:** Comprehensive catalog of all services
---
## 📊 Quick Stats
| Metric | Value | Status |
|--------|-------|--------|
| **Total Containers** | 32 | - |
| **Running** | 6 | ✅ 19% |
| **Stopped** | 26 | ⚠️ 81% |
| **Total Docker Images** | ~50GB | ⚠️ High |
| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
**Key Insight:** 81% of containers are stopped - cleanup opportunity!
---
## 🟢 Running Services (6 containers)
### 1. open-webui ⭐⭐⭐
**Status:** Running (healthy)
**Container:** open-webui
**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)
**Created:** 2025-10-16 (2 weeks ago)
**Network:** bridge (172.17.0.5)
**Ports:** 8080 → 3000
**Resources:**
- CPU: 0.15%
- Memory: 1.026GB / 60.55GB (1.69%)
- Storage: 42.4MB
**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
**Dependencies:**
- ollama (currently STOPPED ❌)
- OpenAI API key (configured)
**Access:**
- Local: http://192.168.68.51:3000
- No authentication by default
**Issues:**
- ⚠️ Depends on ollama container which is stopped
- ⚠️ OpenAI API key exposed in environment variables
**Recommendations:**
1.**KEEP** - Active LLM interface
2. Restart ollama container to enable local models
3. Move API keys to Docker secrets
4. Enable authentication
**Priority:** HIGH - Core AI/ML service
---
### 2. NginxProxyManager ⭐⭐⭐
**Status:** Running
**Container:** NginxProxyManager
**Image:** jlesage/nginx-proxy-manager (189MB)
**Created:** 2025-10-11 (3 weeks ago)
**Network:** bridge (172.17.0.4)
**Ports:** 4443→18443, 8080→1880, 8181→7818
**Resources:**
- CPU: 0.08%
- Memory: 77.45MB (0.12%)
- Storage: 13.4KB
**Purpose:** Reverse proxy with web UI - SSL termination and routing
**Dependencies:** None
**Access:**
- Admin UI: http://192.168.68.51:7818
- HTTP: http://192.168.68.51:1880
- HTTPS: https://192.168.68.51:18443
**Configuration:**
- Routes traffic to backend services
- Manages SSL certificates
- Provides access control
**Recommendations:**
1.**KEEP** - Critical infrastructure
2. Document all proxy rules in Gitea
3. Verify SSL auto-renewal is configured
4. Enable MFA if available
5. Review access logs regularly
**Priority:** CRITICAL - Core infrastructure
---
### 3. Gitea ⭐⭐⭐
**Status:** Running
**Container:** Gitea
**Image:** gitea/gitea (180MB)
**Created:** 2025-10-08 (3 weeks ago)
**Network:** bridge (172.17.0.3)
**Ports:** 22→22, 3000→3002
**Resources:**
- CPU: 0.11%
- Memory: 114.5MB (0.18%)
- Storage: 113MB (active repositories!)
**Purpose:** Self-hosted Git server (GitHub alternative)
**Dependencies:** None (internal SQLite)
**Access:**
- Web: http://192.168.68.51:3002
- Domain: https://gitea.segelschiff.app
- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
**Configuration:**
- Using latest tag (unpinned version)
- Storage: /mnt/user/appdata/gitea
**Issues:**
- ⚠️ SSH port 22 conflicts with Unraid SSH
- ⚠️ Using `latest` tag (version not pinned)
- ⚠️ Backup strategy unknown
**Recommendations:**
1.**KEEP** - Critical for version control
2. Change SSH port to 2222 to avoid conflict
3. Pin to specific version tag
4. Implement automated backups (CRITICAL!)
5. This is your version control hub - protect it!
**Priority:** CRITICAL - Infrastructure documentation depends on this
---
### 4. ApacheGuacamole ⭐⭐
**Status:** Running (2+ months uptime!)
**Container:** ApacheGuacamole
**Image:** jasonbean/guacamole (737MB)
**Created:** 2025-08-22 (2+ months ago)
**Network:** bridge (172.17.0.2)
**Ports:** 8080→4000
**Resources:**
- CPU: 0.16%
- Memory: 785.8MB (1.27%)
- Storage: 46.2MB
**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
**Dependencies:**
- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
**Access:**
- Web: http://192.168.68.51:4000
**Configuration:**
- MySQL enabled but MariaDB stopped
- Multiple auth modules: MySQL, LDAP, TOTP, etc.
**Issues:**
- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
- Currently using embedded database (not recommended)
- Data loss risk without proper database backend
**Recommendations:**
1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
2. If keeping: Start MariaDB and verify connection
3. If not using: Stop Guacamole and remove both
4. Document your use case for remote desktop access
**Priority:** MEDIUM - Fix dependency or remove
---
### 5. Cloudflared ⭐⭐⭐
**Status:** Running (2.5+ months - very stable!)
**Container:** Unraid-Cloudflared-Tunnel
**Image:** figro/unraid-cloudflared-tunnel (8.92MB)
**Created:** 2025-08-10 (2.5+ months ago)
**Network:** bridge (172.17.0.6)
**Ports:** 46495→46495 (metrics)
**Resources:**
- CPU: 0.33% (highest of running containers)
- Memory: 68.6MB (0.11%)
- Network I/O: 41.7MB RX / 310KB TX
**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
**Dependencies:** None
**Access:**
- Metrics: http://192.168.68.51:46495
- Domain: *.segelschiff.app (managed via Cloudflare)
**Configuration:**
- Tunnel token configured
- No auto-update enabled
- Metrics exposed for monitoring
**Security:**
- ⚠️ Tunnel token in plain text environment variable
- ✅ No open ports on router (excellent!)
**Recommendations:**
1.**KEEP** - Excellent security practice
2. Rotate tunnel token periodically
3. Document which services are exposed
4. Integrate metrics with monitoring stack
**Priority:** HIGH - Critical for secure remote access
---
### 6. Vaultwarden ⭐⭐⭐
**Status:** Running (healthy) - 3+ months uptime!
**Container:** vaultwarden
**Image:** vaultwarden/server (256MB)
**Created:** 2025-07-31 (3+ months ago)
**Network:** bridge (172.17.0.7)
**Ports:** 80→4743
**Resources:**
- CPU: 0.00% (idle)
- Memory: 24.96MB (0.04%) - Very lightweight!
**Purpose:** Self-hosted password manager (Bitwarden compatible)
**Dependencies:** None
**Access:**
- Web: http://192.168.68.51:4743
- Admin: http://192.168.68.51:4743/admin
**Configuration:**
- Signups allowed: true ⚠️
- Invitations allowed: false ✅
- WebSocket disabled ⚠️
- Admin token exposed ⚠️
**Issues:**
- 🚨 **CRITICAL:** No backup strategy evident!
- ⚠️ Admin token in plain text
- ⚠️ Signups open (verify intentional)
- ⚠️ WebSocket disabled (reduces functionality)
**Recommendations:**
1.**KEEP** - Critical security infrastructure
2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
3. Close signups after initial setup
4. Rotate admin token and use secrets management
5. Enable WebSocket for better sync
6. Automate daily backups to off-site location
**Priority:** CRITICAL - Contains all your passwords!
---
## 🔴 Recently Stopped Services (Worth Investigating)
### 7. ollama ⚠️
**Status:** Exited (128) 4 minutes ago
**Image:** ollama/ollama (3.33GB)
**Purpose:** Local LLM inference engine
**Why It Matters:** open-webui depends on this!
**Recommendations:**
1. 🔧 **RESTART** - Required for open-webui local models
2. Investigate exit code 128 (configuration issue?)
3. Configure GPU acceleration (RTX 4090!)
4. Test with open-webui after restart
**Action:** `docker start ollama && docker logs -f ollama`
---
### 8. Monitoring Stack (Stopped 12 days ago) 🚨
**Containers:**
- Grafana (stopped 12 days)
- InfluxDB (stopped 12 days)
- Telegraf (stopped 12 days)
**Total Size:** ~1.7GB
**Why Critical:** Zero observability into system health!
**Recommendations:**
1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
2. Configure dashboards for:
- Docker container stats
- System resources (CPU, RAM, disk)
- Network traffic
- Temperature sensors
3. Set up alerting for critical issues
4. Document in runbook
**Action:**
```bash
docker start Influxdb
sleep 15 # Wait for DB initialization
docker start Telegraf
docker start Grafana
```
---
### 9. MariaDB (Stopped 12 days ago) ⚠️
**Status:** Exited (0) 12 days ago
**Image:** lscr.io/linuxserver/mariadb (348MB)
**Purpose:** MySQL database for Guacamole
**Issue:** Guacamole is running but database is stopped!
**Recommendations:**
1. If using Guacamole: **RESTART**
2. If not using Guacamole: **REMOVE BOTH**
3. Document decision
---
### 10. Database Admin Tools (Stopped 12 days ago)
**CloudBeaver** - Stopped 12 days
**adminer** - Stopped 12 days
**Issue:** Two database admin tools - redundant!
**Recommendations:**
1. **CHOOSE ONE:**
- CloudBeaver: Feature-rich (725MB)
- adminer: Lightweight (118MB)
2. Remove the other
3. Only restart if you need database management
---
## 🟡 Experimental / Inactive Services (Decision Needed)
### 11. Nextcloud AIO Stack (7 containers!) 🚨
**Status:** All stopped 3 weeks ago
**Total Size:** ~7GB Docker images + data
**Containers:**
- nextcloud-aio-mastercontainer
- nextcloud-aio-apache
- nextcloud-aio-nextcloud (2.19GB)
- nextcloud-aio-database (PostgreSQL)
- nextcloud-aio-redis
- nextcloud-aio-onlyoffice (3.79GB!)
- nextcloud-aio-imaginary
- nextcloud-aio-notify-push
**Data:** /mnt/user/nextcloud (~1GB+)
**Analysis:**
- Massive resource footprint
- "All-in-One" = heavy coupling
- Stopped for 3 weeks suggests not critical
**Recommendations:**
**DECISION REQUIRED:**
**Option A: Remove Everything**
```bash
# Backup data first!
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
# Remove containers
docker rm nextcloud-aio-*
# Remove images to free space
docker rmi $(docker images | grep nextcloud | awk '{print $3}')
# Archive data
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
```
**Saves:** ~7GB+ space
**Option B: Keep and Restart**
- Document why you need it
- Create restart procedure
- Implement backup strategy
- Monitor resource usage
**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
---
### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
**Status:** Exited (0) 2 weeks ago
**Image:** jellyfin/jellyfin (1.25GB)
**GPU:** RTX 4090 allocated but idle!
**Media:**
- Movies: /mnt/user/movies
- TV: /mnt/user/tv shows
- Music: /mnt/user/music
**Issue:** $1600 GPU sitting idle!
**Recommendations:**
**If you want media server:**
1. **RESTART** with hardware transcoding:
```bash
docker start Jellyfin
```
2. Configure NVENC/NVDEC for RTX 4090
3. Test 4K transcoding performance
4. Switch from `host` network to bridge (security)
**If you don't need media server:**
1. Remove GPU allocation from container
2. Free GPU for other projects (AI/ML)
**Action Required:** Decide on media server strategy
---
### 13. Large AI/ML Containers (Rarely Used)
**ebook2audiobook** - 20.06GB! (stopped 3 weeks)
**docling-serve** - 14.45GB! (stopped 2 weeks)
**Total:** 34.5GB for two containers!
**Analysis:**
- Massive images
- Rarely used (stopped weeks ago)
- Experimental/one-time use?
**Recommendations:**
1. **REMOVE** both to free 34.5GB
2. If needed again, pull fresh images
3. Document use cases if keeping
**Potential Savings:** 34.5GB cache space!
---
### 14. Productivity Suite (Multiple Stopped)
**baserow** - Stopped 2 weeks (2.25GB)
**NocoDB** - Stopped 3 weeks (588MB)
**OpenProject** - Stopped 7 weeks (2.87GB)
**Issue:** Three project management tools - redundant!
**Recommendations:**
1. **CHOOSE ONE** (or none if not used)
2. Remove the others
3. Migrate data if needed first
**Potential Savings:** ~5GB
---
### 15. Development Tools
**n8n** (workflow automation) - Created but never started
**steam-headless** - Created but not running
**Recommendations:**
- Document if you have plans for these
- Remove if experimental and abandoned
---
## 📋 Container Decision Matrix
| Container | Keep? | Action | Priority |
|-----------|-------|--------|----------|
| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
| **ollama** | ✅ Yes | Restart immediately | HIGH |
| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
---
## 🎯 Recommended Action Plan
### Phase 1: Critical (Do First!) 🚨
1. **Backup Vaultwarden** (30 min)
```bash
docker stop vaultwarden
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
docker start vaultwarden
```
2. **Backup Gitea** (30 min)
```bash
docker stop Gitea
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
docker start Gitea
```
3. **Restart Monitoring Stack** (15 min)
```bash
docker start Influxdb && sleep 15
docker start Telegraf Grafana
# Configure dashboards
```
4. **Restart ollama** (5 min)
```bash
docker start ollama
docker logs -f ollama
```
### Phase 2: Cleanup (Free Space!) 💾
5. **Remove Large Unused Containers** (1 hour)
- ebook2audiobook (20GB)
- docling-serve (14.5GB)
- Nextcloud AIO stack (7GB)
- **Saves: ~41GB!**
6. **Docker System Cleanup**
```bash
docker system prune -a
# Free unused images and build cache
```
### Phase 3: Decisions (This Week)
7. **Guacamole + MariaDB** - Keep or remove?
8. **Jellyfin** - Restart with GPU or remove?
9. **Productivity tools** - Choose one, remove others
10. **Database admin** - CloudBeaver or adminer?
---
## 📊 Storage Cleanup Impact
**Current Cache Usage:** 578GB / 932GB (63%)
**After Recommended Cleanup:**
- Remove ebook2audiobook: -20GB
- Remove docling-serve: -14.5GB
- Remove Nextcloud AIO: -7GB
- Docker system prune: ~10-20GB
- **Total Freed: ~50-60GB**
**New Cache Usage:** ~520GB / 932GB (56%) ✅
---
## 🔐 Security Recommendations
1. **Secrets Management** - Stop using plain text env vars
2. **Close Open Signups** - Vaultwarden signups should be closed
3. **SSH Port Conflict** - Fix Gitea port 22 conflict
4. **Network Mode** - Move Jellyfin from `host` to `bridge`
5. **Version Pinning** - Stop using `latest` tags
---
## 📈 Resource Summary
**Docker Images Total:** ~50GB
**Container Data:** Varies by appdata
**Cache Impact:** High (63% full)
**Top Resource Consumers (Images):**
1. ebook2audiobook: 20.06GB
2. docling-serve: 14.45GB
3. Nextcloud stack: ~7GB
4. open-webui: 4.55GB
5. OpenProject: 2.87GB
---
## 🎓 Key Takeaways
1. **6 services are your core** - Keep these running
2. **26 stopped containers** - Cleanup opportunity
3. **~40GB can be freed** - Significant space available
4. **No monitoring** - Critical gap (restart Grafana stack!)
5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
---
**Last Updated:** October 31, 2025
**Next Review:** After cleanup actions completed
**Maintained By:** Weston