Phase 1 Complete: Foundation documentation
Added comprehensive homelab documentation:
README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap
docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands
docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan
docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions
This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
This commit is contained in:
292
docs/network-map.md
Normal file
292
docs/network-map.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# 🌐 Network Map & Topology
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Network Range:** 192.168.68.0/22
|
||||
**Maintained By:** Weston
|
||||
|
||||
---
|
||||
|
||||
## 📊 Quick Reference
|
||||
|
||||
| Device | IP Address | Purpose |
|
||||
|--------|-----------|---------|
|
||||
| **TP-Link Router** | 192.168.68.1 | Gateway, DHCP, Mesh Primary |
|
||||
| **Foxtrot (Gaming PC)** | 192.168.68.50 | Workstation |
|
||||
| **Unraid Server (Tower)** | 192.168.68.51 | Main infrastructure |
|
||||
| **PiKVM** | 192.168.68.53 | Server out-of-band management |
|
||||
| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | DNS + Ad-blocking + Unbound |
|
||||
| **Code-Server VM** | 192.168.68.70 | Ubuntu headless + VS Code |
|
||||
| **TP-Link Mesh Node** | 192.168.71.250 | Office WiFi extender |
|
||||
|
||||
---
|
||||
|
||||
## 🗺️ Physical Network Topology
|
||||
|
||||
```
|
||||
Internet
|
||||
│
|
||||
│ (WAN)
|
||||
│
|
||||
┌───────┴────────┐
|
||||
│ TP-Link Router│
|
||||
│ 192.168.68.1 │
|
||||
│ (Mesh Primary) │
|
||||
└───────┬────────┘
|
||||
│ (LAN - Mesh Network)
|
||||
│
|
||||
┌──────────────┼──────────────┐
|
||||
│ │ │
|
||||
┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐
|
||||
│TP-Link │ │ Unraid │ │Pi Zero │
|
||||
│Mesh Node │ │ Server │ │Pi-hole │
|
||||
│ .71.250 │ │ Tower │ │Unbound │
|
||||
│ (Office) │ │ .68.51 │ │ .68.61 │
|
||||
└────┬─────┘ └────┬─────┘ └──────────┘
|
||||
│ │
|
||||
┌────┼────┐ ┌────┼─────┐
|
||||
│ │ │ │ │ │
|
||||
┌────┴┐ ┌─┴────┐ ┌─┴──┐ │ ┌──┴────┐
|
||||
│Foxtrot│Laptop│ │PiKVM│ │ │VM: │
|
||||
│Gaming│(WiFi)│ │.68.53│ │ │Code │
|
||||
│ PC │ │ │(Direct│ │ │Server │
|
||||
│.68.50│ │ │to Svr)│ │ │.68.70 │
|
||||
└──────┘ └─────┘ └──────┘ │ └───────┘
|
||||
│
|
||||
(Server VMs)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Unraid Server Virtual Network
|
||||
|
||||
```
|
||||
Physical: eth0 (2.5GbE) → bond0 → br0 (192.168.68.51)
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
│ │ │
|
||||
┌────┴─────┐ ┌─────┴──────┐ ┌─────┴─────┐
|
||||
│ VMs │ │ Docker │ │ Tailscale │
|
||||
│ │ │ │ │ VPN │
|
||||
└────┬─────┘ └─────┬──────┘ └───────────┘
|
||||
│ │ 100.122.220.126
|
||||
│ ┌────┴─────┐
|
||||
┌────┴─────┐ │ docker0 │
|
||||
│Code-Srvr │ │172.17.0.1│
|
||||
│ .68.70 │ └────┬─────┘
|
||||
│ (Ubuntu) │ │
|
||||
└──────────┘ ┌────┼────────┬──────┐
|
||||
│ │ │ │
|
||||
┌────┴┐ ┌─┴──┐ ┌───┴──┐ ┌─┴───┐
|
||||
│open-│ │NPM │ │Gitea │ │Guac │
|
||||
│webui│ │ .4 │ │ .3 │ │ .2 │
|
||||
│ .5 │ └────┘ └──────┘ └─────┘
|
||||
└─────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📍 Complete IP Address Table
|
||||
|
||||
### Infrastructure & Services
|
||||
|
||||
| Device/Service | IP Address | MAC | Type | Notes |
|
||||
|---------------|-----------|-----|------|-------|
|
||||
| **TP-Link Router** | 192.168.68.1 | - | Physical | Gateway, DHCP, primary mesh |
|
||||
| **Foxtrot (Gaming PC)** | 192.168.68.50 | - | Physical | Workstation, static IP |
|
||||
| **Unraid Server** | 192.168.68.51 | 58:47:ca:7b:97:b0 | Physical | Main server, static IP |
|
||||
| **PiKVM** | 192.168.68.53 | - | Physical | Direct to server, management |
|
||||
| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | - | Physical | DNS/ad-block/Unbound, static |
|
||||
| **Code-Server VM** | 192.168.68.70 | - | Virtual | Ubuntu + VS Code, KVM/QEMU |
|
||||
| **Laptop** | DHCP | - | Physical | Mobile device, WiFi |
|
||||
| **TP-Link Mesh Node** | 192.168.71.250 | - | Physical | Office WiFi extender |
|
||||
|
||||
### Docker Containers (172.17.0.0/16)
|
||||
|
||||
| Container | Docker IP | Host Port | Purpose |
|
||||
|-----------|-----------|-----------|---------|
|
||||
| **ApacheGuacamole** | 172.17.0.2 | 4000 | Remote desktop gateway |
|
||||
| **Gitea** | 172.17.0.3 | 3002, 22 | Git server |
|
||||
| **NginxProxyManager** | 172.17.0.4 | 1880, 7818, 18443 | Reverse proxy |
|
||||
| **open-webui** | 172.17.0.5 | 3000 | LLM interface |
|
||||
| **Cloudflared** | 172.17.0.6 | 46495 | Cloudflare tunnel |
|
||||
| **Vaultwarden** | 172.17.0.7 | 4743 | Password manager |
|
||||
|
||||
### VPN
|
||||
|
||||
| Service | IP | Network | Purpose |
|
||||
|---------|----|---------| --------|
|
||||
| **Tailscale** | 100.122.220.126 | 100.64.0.0/10 | Secure remote access |
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Network Details
|
||||
|
||||
**Subnet:** 192.168.68.0/22
|
||||
**Netmask:** 255.255.252.0
|
||||
**Usable Range:** 192.168.68.1 - 192.168.71.254 (1022 hosts)
|
||||
**Gateway:** 192.168.68.1
|
||||
**Primary DNS:** 192.168.68.61 (Pi-hole)
|
||||
**Secondary DNS:** 9.9.9.9 (Quad9)
|
||||
**Broadcast:** 192.168.71.255
|
||||
|
||||
---
|
||||
|
||||
## 🔌 Port Reference Guide
|
||||
|
||||
### Unraid Server Ports
|
||||
|
||||
| Service | Port | Protocol | URL |
|
||||
|---------|------|----------|-----|
|
||||
| **Unraid WebUI** | 80 | HTTP | http://192.168.68.51 |
|
||||
| **Unraid SSL** | 443 | HTTPS | https://192.168.68.51 |
|
||||
| **SMB** | 445 | TCP | \\\\192.168.68.51 |
|
||||
| **SSH** | 22 | TCP | ssh root@192.168.68.51 |
|
||||
|
||||
### Container Access
|
||||
|
||||
| Service | URL | Port | Notes |
|
||||
|---------|-----|------|-------|
|
||||
| **open-webui** | http://192.168.68.51:3000 | 3000 | LLM chat interface |
|
||||
| **Gitea** | http://192.168.68.51:3002 | 3002 | Git web UI |
|
||||
| **Gitea (domain)** | https://gitea.segelschiff.app | 443 | Via Cloudflare |
|
||||
| **NPM Web** | http://192.168.68.51:1880 | 1880 | Proxy frontend |
|
||||
| **NPM Admin** | http://192.168.68.51:7818 | 7818 | Management UI |
|
||||
| **Guacamole** | http://192.168.68.51:4000 | 4000 | Remote desktop |
|
||||
| **Vaultwarden** | http://192.168.68.51:4743 | 4743 | Password vault |
|
||||
|
||||
### Infrastructure Access
|
||||
|
||||
| Service | URL | Default Port |
|
||||
|---------|-----|--------------|
|
||||
| **PiKVM** | https://192.168.68.53 | 443 |
|
||||
| **Pi-hole Admin** | http://192.168.68.61/admin | 80 |
|
||||
| **Code-Server** | http://192.168.68.70:8080 | 8080 (typical) |
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ DNS Configuration
|
||||
|
||||
**Primary:** Pi-hole (192.168.68.61)
|
||||
- Ad-blocking
|
||||
- Local DNS records
|
||||
- Query logging
|
||||
- DHCP relay
|
||||
|
||||
**Upstream:** Unbound (same device)
|
||||
- Recursive DNS resolver
|
||||
- No forwarding to ISP
|
||||
- Privacy-focused
|
||||
- DNSSEC validation
|
||||
|
||||
**Resolution Flow:**
|
||||
```
|
||||
Client → Pi-hole (192.168.68.61) → Unbound → Root Servers
|
||||
```
|
||||
|
||||
**Fallback:** 9.9.9.9 (Quad9) - Privacy-respecting public DNS
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Remote Access
|
||||
|
||||
### Cloudflare Tunnel
|
||||
```
|
||||
Internet → Cloudflare Edge → Tunnel → NPM → Services
|
||||
```
|
||||
- **Domain:** *.segelschiff.app
|
||||
- **Services Exposed:** Gitea (and others via NPM)
|
||||
- **Benefits:** No open ports, DDoS protection, SSL
|
||||
- **Container:** Cloudflared (172.17.0.6)
|
||||
|
||||
### Tailscale VPN
|
||||
```
|
||||
Remote Device → Encrypted Tunnel → Unraid (100.122.220.126)
|
||||
```
|
||||
- **Network:** 100.64.0.0/10 (CGNAT)
|
||||
- **Protocol:** WireGuard
|
||||
- **Benefits:** Zero-trust, peer-to-peer, NAT traversal
|
||||
- **Access:** Full homelab as if local
|
||||
|
||||
---
|
||||
|
||||
## 📊 Network Performance
|
||||
|
||||
| Link | Capacity | Usage | Status |
|
||||
|------|----------|-------|--------|
|
||||
| **Unraid NIC** | 2.5 Gbps | <1% | Underutilized |
|
||||
| **Mesh Backhaul** | Unknown | Unknown | Check model specs |
|
||||
| **Internet WAN** | Unknown | Unknown | ISP dependent |
|
||||
|
||||
**Observed (eth0):** ~2 Mbps average = 0.08% of 2.5G capacity
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting Commands
|
||||
|
||||
### Connectivity Tests
|
||||
```bash
|
||||
# Test key infrastructure
|
||||
ping 192.168.68.1 # Router
|
||||
ping 192.168.68.51 # Unraid
|
||||
ping 192.168.68.61 # Pi-hole
|
||||
ping 192.168.68.70 # Code-Server VM
|
||||
ping 8.8.8.8 # Internet
|
||||
|
||||
# DNS tests
|
||||
nslookup google.com 192.168.68.61 # Test Pi-hole
|
||||
dig @192.168.68.61 example.com # Detailed DNS query
|
||||
```
|
||||
|
||||
### Network Status (from Unraid)
|
||||
```bash
|
||||
# Interfaces
|
||||
ip addr show
|
||||
ip link show
|
||||
|
||||
# Routes
|
||||
ip route show
|
||||
|
||||
# Active connections
|
||||
ss -tulpn
|
||||
|
||||
# Docker networks
|
||||
docker network ls
|
||||
docker network inspect bridge
|
||||
```
|
||||
|
||||
### VM Network (Code-Server)
|
||||
```bash
|
||||
# List VMs
|
||||
virsh list --all
|
||||
|
||||
# Get VM IP
|
||||
virsh domifaddr <vm-name>
|
||||
|
||||
# VM network info
|
||||
virsh net-info default
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Recommendations
|
||||
|
||||
### Security
|
||||
1. ⚠️ **Separate Gitea SSH port** - Currently conflicts with Unraid SSH (both port 22)
|
||||
2. ⚠️ **Implement VLANs** - Segment management/services/workstations
|
||||
3. ⚠️ **Firewall hardening** - Move from ACCEPT-all to explicit rules
|
||||
|
||||
### Performance
|
||||
1. Monitor mesh performance between nodes
|
||||
2. Document ISP speeds and plan accordingly
|
||||
3. Consider 10GbE upgrade path (future)
|
||||
|
||||
### Documentation
|
||||
1. ✅ Document Code-Server VM configuration
|
||||
2. ✅ Record TP-Link mesh model and capabilities
|
||||
3. ✅ Map exact ISP speeds and plan
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** When network topology changes
|
||||
**Quick Access:** See README.md for service URLs
|
||||
954
docs/quick-start.md
Normal file
954
docs/quick-start.md
Normal file
@@ -0,0 +1,954 @@
|
||||
# 🚀 Quick Start & Emergency Recovery Guide
|
||||
|
||||
**Purpose:** Get your homelab back online quickly after disaster
|
||||
**Target Time:** 30-60 minutes to basic functionality
|
||||
**Last Updated:** October 31, 2025
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Access Reference
|
||||
|
||||
### Essential URLs
|
||||
|
||||
| Service | URL | Default Credentials |
|
||||
|---------|-----|---------------------|
|
||||
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
|
||||
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
|
||||
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
|
||||
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
|
||||
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
|
||||
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
|
||||
|
||||
### SSH Access
|
||||
|
||||
```bash
|
||||
# Local network
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Via Tailscale (from anywhere)
|
||||
ssh root@100.122.220.126
|
||||
|
||||
# Emergency: Use PiKVM for console access
|
||||
# https://192.168.68.53
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Emergency Recovery Scenarios
|
||||
|
||||
### Scenario 1: Server Won't Boot 🚨
|
||||
|
||||
**Symptoms:**
|
||||
- No network connectivity to 192.168.68.51
|
||||
- Unraid WebUI unreachable
|
||||
- No response to ping
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Physical Check** (via PiKVM or in person)
|
||||
```
|
||||
[ ] Server has power (check LED)
|
||||
[ ] Network cable connected to eth0
|
||||
[ ] Monitor shows output (via PiKVM)
|
||||
[ ] USB boot drive is present and detected
|
||||
```
|
||||
|
||||
2. **Use PiKVM for Remote Console**
|
||||
- Access: https://192.168.68.53
|
||||
- Login: admin / admin
|
||||
- View boot process
|
||||
- Check BIOS/boot messages
|
||||
|
||||
3. **Common Boot Issues**
|
||||
|
||||
**USB Boot Drive Failure** (Most common!)
|
||||
```
|
||||
Symptoms: "Boot device not found" or similar
|
||||
|
||||
Fix:
|
||||
1. Have backup USB ready
|
||||
2. Shut down server (via PiKVM power control)
|
||||
3. Replace USB boot drive
|
||||
4. Power on
|
||||
5. Restore configuration from backup
|
||||
```
|
||||
|
||||
**BIOS Settings Changed**
|
||||
```
|
||||
Fix:
|
||||
1. Enter BIOS (DEL/F2 during boot)
|
||||
2. Load defaults
|
||||
3. Verify boot order (USB first)
|
||||
4. Save and exit
|
||||
```
|
||||
|
||||
**Hardware Failure**
|
||||
```
|
||||
Check:
|
||||
1. RAM seated properly
|
||||
2. All drives detected in BIOS
|
||||
3. CPU fan spinning
|
||||
4. No error beeps
|
||||
```
|
||||
|
||||
4. **Boot from Backup USB**
|
||||
```
|
||||
Steps:
|
||||
1. Power off server
|
||||
2. Insert backup USB boot drive
|
||||
3. Power on
|
||||
4. Verify boot successful
|
||||
5. Restore configuration:
|
||||
- Tools → Flash Backup → Browse → Select backup ZIP
|
||||
- Reboot
|
||||
```
|
||||
|
||||
**Prevention:**
|
||||
- ✅ Keep USB flash backup updated (weekly)
|
||||
- ✅ Store backup USB in safe location
|
||||
- ✅ Document BIOS settings (screenshots via PiKVM)
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2: Lost Admin Password
|
||||
|
||||
**Unraid Root Password Reset:**
|
||||
|
||||
1. **Via PiKVM Console**
|
||||
```
|
||||
1. Access PiKVM: https://192.168.68.53
|
||||
2. View console in browser
|
||||
3. Wait for login prompt
|
||||
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
|
||||
5. At terminal: passwd root
|
||||
6. Enter new password twice
|
||||
7. Press Ctrl+Alt+F1 to return to GUI
|
||||
8. Update documentation
|
||||
```
|
||||
|
||||
2. **Via Physical Access**
|
||||
```
|
||||
1. Connect monitor and keyboard to server
|
||||
2. Press Ctrl+Alt+F2
|
||||
3. Run: passwd root
|
||||
4. Set new password
|
||||
5. Press Ctrl+Alt+F1
|
||||
```
|
||||
|
||||
**Container Passwords:**
|
||||
- Check `/mnt/user/appdata/<service>/config`
|
||||
- Review environment variables in Docker templates
|
||||
- Use Vaultwarden if accessible
|
||||
- Check this documentation repo in Gitea
|
||||
|
||||
---
|
||||
|
||||
### Scenario 3: Container Won't Start
|
||||
|
||||
**Quick Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps -a | grep <container_name>
|
||||
|
||||
# View recent logs
|
||||
docker logs --tail 100 <container_name>
|
||||
|
||||
# Look for errors
|
||||
docker inspect <container_name> | grep -i error
|
||||
```
|
||||
|
||||
**Common Fixes:**
|
||||
|
||||
**Port Conflict:**
|
||||
```bash
|
||||
# Find what's using the port
|
||||
netstat -tulpn | grep <port>
|
||||
|
||||
# Example: Port 3000 already in use
|
||||
netstat -tulpn | grep 3000
|
||||
|
||||
# Stop conflicting service
|
||||
docker stop <conflicting_container>
|
||||
```
|
||||
|
||||
**Volume Permission Issues:**
|
||||
```bash
|
||||
# Check ownership
|
||||
ls -la /mnt/user/appdata/<container_name>
|
||||
|
||||
# Fix permissions (Unraid standard: 99:100)
|
||||
chown -R 99:100 /mnt/user/appdata/<container_name>
|
||||
|
||||
# Example: Fix Vaultwarden
|
||||
chown -R 99:100 /mnt/user/appdata/vaultwarden
|
||||
```
|
||||
|
||||
**Dependency Missing:**
|
||||
```bash
|
||||
# Example: Guacamole needs MariaDB
|
||||
docker start mariadb
|
||||
sleep 10 # Wait for database initialization
|
||||
docker start ApacheGuacamole
|
||||
|
||||
# Verify dependency is running
|
||||
docker ps | grep mariadb
|
||||
```
|
||||
|
||||
**Resource Exhaustion:**
|
||||
```bash
|
||||
# Check cache usage
|
||||
df -h /mnt/cache
|
||||
|
||||
# If cache full (>90%), clean up
|
||||
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
|
||||
|
||||
# Or free space manually
|
||||
# See service-inventory.md for cleanup recommendations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 4: Network Connectivity Issues
|
||||
|
||||
**Can't Access from LAN:**
|
||||
|
||||
```bash
|
||||
# SSH into Unraid (via PiKVM if network down)
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Check if br0 is up
|
||||
ip addr show br0
|
||||
# Should show: 192.168.68.51/22
|
||||
|
||||
# Verify IP and routes
|
||||
ip route | grep default
|
||||
# Should show: default via 192.168.68.1
|
||||
|
||||
# Test router connectivity
|
||||
ping -c 3 192.168.68.1
|
||||
|
||||
# Test internet
|
||||
ping -c 3 8.8.8.8
|
||||
|
||||
# Test DNS (Pi-hole)
|
||||
nslookup google.com 192.168.68.61
|
||||
```
|
||||
|
||||
**Fix Network Issues:**
|
||||
|
||||
```bash
|
||||
# Restart networking (from console/PiKVM)
|
||||
/etc/rc.d/rc.inet1 restart
|
||||
|
||||
# If that doesn't work, reboot
|
||||
reboot
|
||||
```
|
||||
|
||||
**Can't Access Containers:**
|
||||
|
||||
```bash
|
||||
# Check Docker network
|
||||
docker network inspect bridge
|
||||
|
||||
# Verify container IP
|
||||
docker inspect <container_name> | grep IPAddress
|
||||
|
||||
# Test from Unraid host
|
||||
curl http://172.17.0.5:8080 # Example: open-webui
|
||||
|
||||
# Test port mapping
|
||||
curl http://192.168.68.51:3000 # Should reach open-webui
|
||||
```
|
||||
|
||||
**DNS Not Resolving:**
|
||||
|
||||
```bash
|
||||
# Test Pi-hole directly
|
||||
nslookup google.com 192.168.68.61
|
||||
|
||||
# If Pi-hole down, check Pi Zero
|
||||
ping 192.168.68.61
|
||||
|
||||
# SSH to Pi-hole
|
||||
ssh pi@192.168.68.61
|
||||
|
||||
# Check Pi-hole status
|
||||
pihole status
|
||||
|
||||
# Restart if needed
|
||||
pihole restartdns
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 5: Array Won't Start
|
||||
|
||||
**Symptoms:**
|
||||
- Unraid GUI accessible but array shows "Stopped"
|
||||
- Disks show errors or missing
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
```bash
|
||||
# Check disk health
|
||||
smartctl -a /dev/sdb # Parity
|
||||
smartctl -a /dev/sdc # Disk 1
|
||||
|
||||
# View disk assignments
|
||||
cat /boot/config/disk.cfg
|
||||
|
||||
# Check for filesystem errors (read-only check)
|
||||
xfs_repair -n /dev/md1p1
|
||||
```
|
||||
|
||||
**Common Causes:**
|
||||
- Parity sync in progress (wait for completion)
|
||||
- Disk failed (check SMART, may need replacement)
|
||||
- Unclean shutdown (filesystem check required)
|
||||
- Disk assignment changed
|
||||
|
||||
**Recovery:**
|
||||
|
||||
1. **Start Array in Maintenance Mode**
|
||||
- Click "Start" in Unraid GUI
|
||||
- Select "Maintenance mode" if prompted
|
||||
- Run filesystem check if prompted
|
||||
|
||||
2. **Review Logs**
|
||||
- Settings → System Log
|
||||
- Look for disk errors
|
||||
- Check for power events
|
||||
|
||||
3. **If Disk Failed**
|
||||
- Follow Unraid disk replacement procedure
|
||||
- Do NOT format or write to disk unnecessarily
|
||||
- Seek help in Unraid forums if uncertain
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Critical Service Restart Procedures
|
||||
|
||||
### Restart Core Services (Proper Order)
|
||||
|
||||
**1. Infrastructure First:**
|
||||
```bash
|
||||
# Start reverse proxy (for routing)
|
||||
docker start NginxProxyManager
|
||||
|
||||
# Wait for it to be ready
|
||||
sleep 5
|
||||
docker ps | grep NginxProxyManager
|
||||
|
||||
# Start tunnel (for remote access)
|
||||
docker start Cloudflared
|
||||
|
||||
# Verify both running
|
||||
docker ps | grep -E "NginxProxyManager|Cloudflared"
|
||||
```
|
||||
|
||||
**2. Security Services:**
|
||||
```bash
|
||||
# Password manager (critical!)
|
||||
docker start vaultwarden
|
||||
|
||||
# Wait for healthy status
|
||||
sleep 10
|
||||
docker ps | grep vaultwarden
|
||||
# Should show "(healthy)"
|
||||
|
||||
# If not healthy, check logs
|
||||
docker logs --tail 50 vaultwarden
|
||||
```
|
||||
|
||||
**3. Development Tools:**
|
||||
```bash
|
||||
# Git server
|
||||
docker start Gitea
|
||||
|
||||
# Wait for initialization
|
||||
sleep 5
|
||||
|
||||
# Remote access gateway
|
||||
docker start ApacheGuacamole
|
||||
# Note: Needs MariaDB if configured
|
||||
```
|
||||
|
||||
**4. Monitoring (IMPORTANT!):**
|
||||
```bash
|
||||
# Database first
|
||||
docker start Influxdb
|
||||
|
||||
# Wait for DB to initialize
|
||||
sleep 15
|
||||
|
||||
# Then metrics collector
|
||||
docker start Telegraf
|
||||
|
||||
# Finally visualization
|
||||
docker start Grafana
|
||||
|
||||
# Verify all running
|
||||
docker ps | grep -E "Influxdb|Telegraf|Grafana"
|
||||
```
|
||||
|
||||
**5. Optional Services:**
|
||||
```bash
|
||||
# LLM backend
|
||||
docker start ollama
|
||||
sleep 10
|
||||
|
||||
# LLM interface
|
||||
docker start open-webui
|
||||
|
||||
# Wait for healthy
|
||||
docker ps | grep open-webui
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stop All Services Gracefully
|
||||
|
||||
```bash
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# Verify all stopped
|
||||
docker ps
|
||||
# Should show empty output
|
||||
|
||||
# Wait before stopping array
|
||||
sleep 5
|
||||
|
||||
# Stop array (from GUI)
|
||||
# Main → Array Operation → Stop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📦 Backup & Restore Procedures
|
||||
|
||||
### USB Flash Backup (Unraid Configuration)
|
||||
|
||||
**Create Backup:**
|
||||
1. Navigate to: **Main → Flash → Flash Backup**
|
||||
2. Click "Backup Now"
|
||||
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
|
||||
4. Store securely OFF-SERVER:
|
||||
- OneDrive: `/z_Unraid/Backups/`
|
||||
- External drive
|
||||
- Cloud storage
|
||||
|
||||
**Restore from Backup:**
|
||||
```
|
||||
1. Format new USB drive (if needed)
|
||||
2. Copy backup ZIP to new USB
|
||||
3. Extract contents to root of USB
|
||||
- config/ directory
|
||||
- bzimage, bzroot, etc.
|
||||
4. Safely eject USB
|
||||
5. Boot from new USB
|
||||
6. Configuration restored automatically
|
||||
```
|
||||
|
||||
**Frequency:**
|
||||
- Weekly minimum
|
||||
- After ANY configuration change
|
||||
- Before major updates
|
||||
|
||||
---
|
||||
|
||||
### Container Data Backup
|
||||
|
||||
**Critical Directories:**
|
||||
|
||||
```
|
||||
Priority 1 (CRITICAL):
|
||||
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
|
||||
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
|
||||
|
||||
Priority 2 (Important):
|
||||
/mnt/user/appdata/NginxProxyManager/ Proxy configs
|
||||
/mnt/user/appdata/Grafana/ Dashboards
|
||||
/mnt/user/appdata/Influxdb/ Metrics history
|
||||
|
||||
Priority 3 (Optional):
|
||||
/mnt/user/appdata/open-webui/ LLM chat history
|
||||
```
|
||||
|
||||
**Quick Backup Script:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/backup-critical.sh
|
||||
|
||||
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
echo "Stopping containers..."
|
||||
docker stop vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "Backing up data..."
|
||||
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
|
||||
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
|
||||
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
|
||||
|
||||
echo "Restarting containers..."
|
||||
docker start vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "✅ Backup complete: $BACKUP_DIR"
|
||||
ls -lh "$BACKUP_DIR"
|
||||
```
|
||||
|
||||
**Make Executable:**
|
||||
```bash
|
||||
chmod +x /mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Run Manually:**
|
||||
```bash
|
||||
/mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Schedule (User Scripts Plugin):**
|
||||
- Frequency: Daily at 2 AM
|
||||
- Retention: Keep last 30 days
|
||||
|
||||
---
|
||||
|
||||
**Restore from Backup:**
|
||||
|
||||
```bash
|
||||
# Example: Restore Vaultwarden
|
||||
docker stop vaultwarden
|
||||
|
||||
# Backup current (corrupted) data
|
||||
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
|
||||
|
||||
# Extract backup
|
||||
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
|
||||
|
||||
# Restart container
|
||||
docker start vaultwarden
|
||||
|
||||
# Verify working
|
||||
curl http://192.168.68.51:4743
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Quick Commands Reference
|
||||
|
||||
### System Status
|
||||
|
||||
```bash
|
||||
# System uptime and load
|
||||
uptime
|
||||
|
||||
# Resource usage
|
||||
free -h
|
||||
df -h
|
||||
|
||||
# Array status
|
||||
cat /proc/mdcmd
|
||||
|
||||
# Docker container summary
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
|
||||
# Temperature (if sensors installed)
|
||||
sensors
|
||||
|
||||
# Disk health quick check
|
||||
smartctl -H /dev/sdb # Parity
|
||||
smartctl -H /dev/sdc # Disk 1
|
||||
```
|
||||
|
||||
### Docker Quick Commands
|
||||
|
||||
```bash
|
||||
# Start all stopped containers
|
||||
docker start $(docker ps -aq)
|
||||
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# View logs (last 50 lines)
|
||||
docker logs --tail 50 <container_name>
|
||||
|
||||
# Follow logs in real-time
|
||||
docker logs -f <container_name>
|
||||
|
||||
# Restart container
|
||||
docker restart <container_name>
|
||||
|
||||
# Remove container (⚠️ will lose non-volume data!)
|
||||
docker rm <container_name>
|
||||
|
||||
# Clean up unused resources
|
||||
docker system prune # Safe cleanup
|
||||
docker system prune -a # ⚠️ Removes unused images too!
|
||||
docker system prune --volumes # ⚠️ Removes unused volumes!
|
||||
```
|
||||
|
||||
### Network Diagnostics
|
||||
|
||||
```bash
|
||||
# Check all interfaces
|
||||
ip addr show
|
||||
|
||||
# Test key infrastructure
|
||||
ping -c 3 192.168.68.1 # Router
|
||||
ping -c 3 192.168.68.51 # Unraid
|
||||
ping -c 3 192.168.68.61 # Pi-hole
|
||||
ping -c 3 8.8.8.8 # Internet
|
||||
|
||||
# DNS resolution test
|
||||
nslookup google.com
|
||||
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
|
||||
|
||||
# Check listening ports
|
||||
netstat -tulpn | grep LISTEN
|
||||
|
||||
# Test specific port
|
||||
nc -zv 192.168.68.51 3002 # Example: Gitea
|
||||
curl -I http://192.168.68.51:3002 # HTTP test
|
||||
```
|
||||
|
||||
### Quick Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/health-check.sh
|
||||
|
||||
echo "=== Unraid Health Check ==="
|
||||
echo ""
|
||||
|
||||
echo "1. Array Status:"
|
||||
cat /proc/mdcmd | grep mdState
|
||||
|
||||
echo ""
|
||||
echo "2. Running Containers:"
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||||
|
||||
echo ""
|
||||
echo "3. Disk Usage:"
|
||||
df -h | grep -E "cache|disk1|Filesystem"
|
||||
|
||||
echo ""
|
||||
echo "4. Network Connectivity:"
|
||||
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
|
||||
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
|
||||
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
|
||||
|
||||
echo ""
|
||||
echo "5. Critical Services:"
|
||||
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
|
||||
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
|
||||
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
|
||||
|
||||
echo ""
|
||||
echo "=== Health Check Complete ==="
|
||||
```
|
||||
|
||||
**Run:** `bash /mnt/user/scripts/health-check.sh`
|
||||
|
||||
---
|
||||
|
||||
## 📞 Getting Help
|
||||
|
||||
### Pre-flight Checks
|
||||
|
||||
Before asking for help, gather this information:
|
||||
|
||||
1. **System Diagnostics**
|
||||
- Unraid WebGUI: Tools → Diagnostics → Download
|
||||
- Creates ZIP with all logs
|
||||
|
||||
2. **Container Logs**
|
||||
```bash
|
||||
docker logs <container_name> > container-logs.txt
|
||||
```
|
||||
|
||||
3. **Network Configuration**
|
||||
```bash
|
||||
ip addr show > network-config.txt
|
||||
ip route show >> network-config.txt
|
||||
```
|
||||
|
||||
4. **Disk Status**
|
||||
```bash
|
||||
smartctl -a /dev/sdb > disk-smart.txt
|
||||
smartctl -a /dev/sdc >> disk-smart.txt
|
||||
```
|
||||
|
||||
### Community Resources
|
||||
|
||||
- **Unraid Forums:** https://forums.unraid.net/
|
||||
- Post diagnostics ZIP
|
||||
- Be specific about symptoms
|
||||
- Include what you've tried
|
||||
|
||||
- **r/unraid:** https://reddit.com/r/unraid
|
||||
- Quick questions
|
||||
- Share diagnostics in pastebin
|
||||
|
||||
- **Discord:** Unraid Official Discord
|
||||
- Real-time help
|
||||
- Active community
|
||||
|
||||
### Emergency Contacts
|
||||
|
||||
```
|
||||
ISP Support: [Your ISP Phone Number]
|
||||
Unraid License: [Store in secure location]
|
||||
USB Backup Location: [Document where stored]
|
||||
Off-site Backup: [If applicable]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Post-Recovery Checklist
|
||||
|
||||
After restoring from disaster:
|
||||
|
||||
```
|
||||
[ ] Unraid array started successfully
|
||||
[ ] All critical services running
|
||||
[ ] NginxProxyManager
|
||||
[ ] Cloudflared
|
||||
[ ] Vaultwarden
|
||||
[ ] Gitea
|
||||
[ ] Network connectivity verified
|
||||
[ ] Can access Unraid WebUI
|
||||
[ ] Can ping router (192.168.68.1)
|
||||
[ ] Internet working
|
||||
[ ] DNS resolving (Pi-hole)
|
||||
[ ] Vaultwarden accessible (test password retrieval)
|
||||
[ ] Gitea accessible (verify repositories intact)
|
||||
[ ] NPM routing working (test reverse proxy)
|
||||
[ ] Monitoring stack restarted
|
||||
[ ] Grafana
|
||||
[ ] InfluxDB
|
||||
[ ] Telegraf
|
||||
[ ] External access working
|
||||
[ ] Tailscale connected
|
||||
[ ] Cloudflare tunnel active
|
||||
[ ] Backups verified and up-to-date
|
||||
[ ] Documentation updated with lessons learned
|
||||
[ ] Incident documented in change log (Gitea)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security After Recovery
|
||||
|
||||
**Immediately After Disaster Recovery:**
|
||||
|
||||
1. **Change Passwords** (if compromise suspected)
|
||||
```
|
||||
[ ] Unraid root password
|
||||
[ ] Vaultwarden master password
|
||||
[ ] Container admin passwords
|
||||
[ ] Pi-hole admin password
|
||||
[ ] PiKVM password
|
||||
```
|
||||
|
||||
2. **Review Access Logs**
|
||||
```bash
|
||||
# Check SSH attempts
|
||||
grep "Failed password" /var/log/auth.log | tail -50
|
||||
|
||||
# Check NPM access
|
||||
docker logs NginxProxyManager | grep -i error
|
||||
|
||||
# Check Gitea access
|
||||
docker logs Gitea | grep -i login
|
||||
```
|
||||
|
||||
3. **Verify Firewall Rules**
|
||||
```bash
|
||||
iptables -L -n -v
|
||||
```
|
||||
|
||||
4. **Check for Unauthorized Changes**
|
||||
```bash
|
||||
# Review Docker containers
|
||||
docker ps -a
|
||||
|
||||
# Check cron jobs
|
||||
crontab -l
|
||||
|
||||
# Review network interfaces
|
||||
ip addr show
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Documentation Updates After Incident
|
||||
|
||||
**What to Document:**
|
||||
|
||||
1. **What Happened:**
|
||||
- Date/time of incident
|
||||
- Symptoms observed
|
||||
- Root cause (if determined)
|
||||
- Duration of outage
|
||||
|
||||
2. **What You Did:**
|
||||
- Steps taken to recover
|
||||
- What worked / didn't work
|
||||
- Resources used (forums, docs, etc.)
|
||||
- Time to recovery
|
||||
|
||||
3. **Lessons Learned:**
|
||||
- What could prevent this in future
|
||||
- Process improvements needed
|
||||
- Documentation gaps discovered
|
||||
- Backup improvements needed
|
||||
|
||||
4. **Action Items:**
|
||||
- Backups to implement/improve
|
||||
- Monitoring to add
|
||||
- Scripts to create
|
||||
- Hardware to replace/upgrade
|
||||
|
||||
**Where to Document:**
|
||||
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
|
||||
- Update this quick-start guide with new procedures
|
||||
- Add to troubleshooting section if recurring issue
|
||||
- Commit to Gitea with detailed message
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Normal Startup Sequence
|
||||
|
||||
**From Cold Boot:**
|
||||
|
||||
```
|
||||
1. Power on server
|
||||
↓
|
||||
2. BIOS POST (~30 seconds)
|
||||
- Hardware check
|
||||
- Memory test
|
||||
- Drive detection
|
||||
↓
|
||||
3. Unraid boots from USB (~1-2 minutes)
|
||||
- Linux kernel loads
|
||||
- Unraid OS starts
|
||||
↓
|
||||
4. Network initializes
|
||||
- br0 interface up
|
||||
- Gets IP: 192.168.68.51
|
||||
↓
|
||||
5. Array auto-starts (if configured)
|
||||
- Parity disk: sdb
|
||||
- Data disk: sdc
|
||||
- Cache: nvme1n1p1
|
||||
↓
|
||||
6. Docker service starts
|
||||
- docker0 bridge created
|
||||
- Networks initialized
|
||||
↓
|
||||
7. Containers auto-start (if enabled)
|
||||
- Infrastructure services first
|
||||
- Then application services
|
||||
↓
|
||||
8. Services available (~3-5 minutes total)
|
||||
✅ Ready to use!
|
||||
```
|
||||
|
||||
**Expected Boot Time:** 3-5 minutes
|
||||
**If Taking Longer:** Check system log for errors
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Health Check Command
|
||||
|
||||
**Run After Any Restart:**
|
||||
|
||||
```bash
|
||||
# Quick one-liner health check
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}" && \
|
||||
df -h | grep -E "cache|disk1" && \
|
||||
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **Network Issues:** See `network-map.md`
|
||||
- **Service Details:** See `service-inventory.md`
|
||||
- **Container Configs:** See `docker-compose/` (when created)
|
||||
- **Main Overview:** See `README.md`
|
||||
|
||||
---
|
||||
|
||||
## 🆘 True Emergency - Complete System Down
|
||||
|
||||
**If everything is down and you need immediate help:**
|
||||
|
||||
1. **Access via PiKVM**
|
||||
- https://192.168.68.53
|
||||
- Get console access
|
||||
- View what's happening
|
||||
|
||||
2. **Check Physical Server**
|
||||
- Power LED on?
|
||||
- Fans spinning?
|
||||
- Drives spinning up?
|
||||
- Network activity lights?
|
||||
|
||||
3. **Try Safe Mode Boot**
|
||||
- Boot Unraid in Safe Mode (GUI mode)
|
||||
- Diagnose from console
|
||||
|
||||
4. **Community Help**
|
||||
- Unraid Discord (fastest response)
|
||||
- Forums with diagnostics ZIP
|
||||
- r/unraid for quick questions
|
||||
|
||||
5. **Document Everything**
|
||||
- Take photos/screenshots via PiKVM
|
||||
- Note exact error messages
|
||||
- Record what you tried
|
||||
- Timeline of events
|
||||
|
||||
---
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
1. **Test Your Backups**
|
||||
- Restore test annually
|
||||
- Verify data integrity
|
||||
- Practice recovery procedures
|
||||
|
||||
2. **Keep This Guide Accessible**
|
||||
- Save offline copy to phone/laptop
|
||||
- Print critical sections
|
||||
- Bookmark in browser
|
||||
|
||||
3. **Automate Where Possible**
|
||||
- Schedule backup scripts
|
||||
- Set up monitoring alerts
|
||||
- Use User Scripts plugin
|
||||
|
||||
4. **Document As You Go**
|
||||
- Update after fixing issues
|
||||
- Add new procedures discovered
|
||||
- Note what worked/didn't work
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** Quarterly or after incidents
|
||||
**Maintained By:** Weston
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
|
||||
|
||||
**Keep this guide accessible even when the server is down!**
|
||||
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
|
||||
|
||||
🚀 **You've got this!**
|
||||
614
docs/service-inventory.md
Normal file
614
docs/service-inventory.md
Normal file
@@ -0,0 +1,614 @@
|
||||
# 📦 Service Inventory - Complete Container Catalog
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Total Containers:** 32 (6 running, 26 stopped)
|
||||
**Purpose:** Comprehensive catalog of all services
|
||||
|
||||
---
|
||||
|
||||
## 📊 Quick Stats
|
||||
|
||||
| Metric | Value | Status |
|
||||
|--------|-------|--------|
|
||||
| **Total Containers** | 32 | - |
|
||||
| **Running** | 6 | ✅ 19% |
|
||||
| **Stopped** | 26 | ⚠️ 81% |
|
||||
| **Total Docker Images** | ~50GB | ⚠️ High |
|
||||
| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
|
||||
|
||||
**Key Insight:** 81% of containers are stopped - cleanup opportunity!
|
||||
|
||||
---
|
||||
|
||||
## 🟢 Running Services (6 containers)
|
||||
|
||||
### 1. open-webui ⭐⭐⭐
|
||||
|
||||
**Status:** Running (healthy)
|
||||
**Container:** open-webui
|
||||
**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)
|
||||
**Created:** 2025-10-16 (2 weeks ago)
|
||||
**Network:** bridge (172.17.0.5)
|
||||
**Ports:** 8080 → 3000
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.15%
|
||||
- Memory: 1.026GB / 60.55GB (1.69%)
|
||||
- Storage: 42.4MB
|
||||
|
||||
**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
|
||||
|
||||
**Dependencies:**
|
||||
- ollama (currently STOPPED ❌)
|
||||
- OpenAI API key (configured)
|
||||
|
||||
**Access:**
|
||||
- Local: http://192.168.68.51:3000
|
||||
- No authentication by default
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ Depends on ollama container which is stopped
|
||||
- ⚠️ OpenAI API key exposed in environment variables
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Active LLM interface
|
||||
2. Restart ollama container to enable local models
|
||||
3. Move API keys to Docker secrets
|
||||
4. Enable authentication
|
||||
|
||||
**Priority:** HIGH - Core AI/ML service
|
||||
|
||||
---
|
||||
|
||||
### 2. NginxProxyManager ⭐⭐⭐
|
||||
|
||||
**Status:** Running
|
||||
**Container:** NginxProxyManager
|
||||
**Image:** jlesage/nginx-proxy-manager (189MB)
|
||||
**Created:** 2025-10-11 (3 weeks ago)
|
||||
**Network:** bridge (172.17.0.4)
|
||||
**Ports:** 4443→18443, 8080→1880, 8181→7818
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.08%
|
||||
- Memory: 77.45MB (0.12%)
|
||||
- Storage: 13.4KB
|
||||
|
||||
**Purpose:** Reverse proxy with web UI - SSL termination and routing
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Admin UI: http://192.168.68.51:7818
|
||||
- HTTP: http://192.168.68.51:1880
|
||||
- HTTPS: https://192.168.68.51:18443
|
||||
|
||||
**Configuration:**
|
||||
- Routes traffic to backend services
|
||||
- Manages SSL certificates
|
||||
- Provides access control
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical infrastructure
|
||||
2. Document all proxy rules in Gitea
|
||||
3. Verify SSL auto-renewal is configured
|
||||
4. Enable MFA if available
|
||||
5. Review access logs regularly
|
||||
|
||||
**Priority:** CRITICAL - Core infrastructure
|
||||
|
||||
---
|
||||
|
||||
### 3. Gitea ⭐⭐⭐
|
||||
|
||||
**Status:** Running
|
||||
**Container:** Gitea
|
||||
**Image:** gitea/gitea (180MB)
|
||||
**Created:** 2025-10-08 (3 weeks ago)
|
||||
**Network:** bridge (172.17.0.3)
|
||||
**Ports:** 22→22, 3000→3002
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.11%
|
||||
- Memory: 114.5MB (0.18%)
|
||||
- Storage: 113MB (active repositories!)
|
||||
|
||||
**Purpose:** Self-hosted Git server (GitHub alternative)
|
||||
|
||||
**Dependencies:** None (internal SQLite)
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:3002
|
||||
- Domain: https://gitea.segelschiff.app
|
||||
- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
|
||||
|
||||
**Configuration:**
|
||||
- Using latest tag (unpinned version)
|
||||
- Storage: /mnt/user/appdata/gitea
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ SSH port 22 conflicts with Unraid SSH
|
||||
- ⚠️ Using `latest` tag (version not pinned)
|
||||
- ⚠️ Backup strategy unknown
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical for version control
|
||||
2. Change SSH port to 2222 to avoid conflict
|
||||
3. Pin to specific version tag
|
||||
4. Implement automated backups (CRITICAL!)
|
||||
5. This is your version control hub - protect it!
|
||||
|
||||
**Priority:** CRITICAL - Infrastructure documentation depends on this
|
||||
|
||||
---
|
||||
|
||||
### 4. ApacheGuacamole ⭐⭐
|
||||
|
||||
**Status:** Running (2+ months uptime!)
|
||||
**Container:** ApacheGuacamole
|
||||
**Image:** jasonbean/guacamole (737MB)
|
||||
**Created:** 2025-08-22 (2+ months ago)
|
||||
**Network:** bridge (172.17.0.2)
|
||||
**Ports:** 8080→4000
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.16%
|
||||
- Memory: 785.8MB (1.27%)
|
||||
- Storage: 46.2MB
|
||||
|
||||
**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
|
||||
|
||||
**Dependencies:**
|
||||
- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:4000
|
||||
|
||||
**Configuration:**
|
||||
- MySQL enabled but MariaDB stopped
|
||||
- Multiple auth modules: MySQL, LDAP, TOTP, etc.
|
||||
|
||||
**Issues:**
|
||||
- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
|
||||
- Currently using embedded database (not recommended)
|
||||
- Data loss risk without proper database backend
|
||||
|
||||
**Recommendations:**
|
||||
1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
|
||||
2. If keeping: Start MariaDB and verify connection
|
||||
3. If not using: Stop Guacamole and remove both
|
||||
4. Document your use case for remote desktop access
|
||||
|
||||
**Priority:** MEDIUM - Fix dependency or remove
|
||||
|
||||
---
|
||||
|
||||
### 5. Cloudflared ⭐⭐⭐
|
||||
|
||||
**Status:** Running (2.5+ months - very stable!)
|
||||
**Container:** Unraid-Cloudflared-Tunnel
|
||||
**Image:** figro/unraid-cloudflared-tunnel (8.92MB)
|
||||
**Created:** 2025-08-10 (2.5+ months ago)
|
||||
**Network:** bridge (172.17.0.6)
|
||||
**Ports:** 46495→46495 (metrics)
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.33% (highest of running containers)
|
||||
- Memory: 68.6MB (0.11%)
|
||||
- Network I/O: 41.7MB RX / 310KB TX
|
||||
|
||||
**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Metrics: http://192.168.68.51:46495
|
||||
- Domain: *.segelschiff.app (managed via Cloudflare)
|
||||
|
||||
**Configuration:**
|
||||
- Tunnel token configured
|
||||
- No auto-update enabled
|
||||
- Metrics exposed for monitoring
|
||||
|
||||
**Security:**
|
||||
- ⚠️ Tunnel token in plain text environment variable
|
||||
- ✅ No open ports on router (excellent!)
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Excellent security practice
|
||||
2. Rotate tunnel token periodically
|
||||
3. Document which services are exposed
|
||||
4. Integrate metrics with monitoring stack
|
||||
|
||||
**Priority:** HIGH - Critical for secure remote access
|
||||
|
||||
---
|
||||
|
||||
### 6. Vaultwarden ⭐⭐⭐
|
||||
|
||||
**Status:** Running (healthy) - 3+ months uptime!
|
||||
**Container:** vaultwarden
|
||||
**Image:** vaultwarden/server (256MB)
|
||||
**Created:** 2025-07-31 (3+ months ago)
|
||||
**Network:** bridge (172.17.0.7)
|
||||
**Ports:** 80→4743
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.00% (idle)
|
||||
- Memory: 24.96MB (0.04%) - Very lightweight!
|
||||
|
||||
**Purpose:** Self-hosted password manager (Bitwarden compatible)
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:4743
|
||||
- Admin: http://192.168.68.51:4743/admin
|
||||
|
||||
**Configuration:**
|
||||
- Signups allowed: true ⚠️
|
||||
- Invitations allowed: false ✅
|
||||
- WebSocket disabled ⚠️
|
||||
- Admin token exposed ⚠️
|
||||
|
||||
**Issues:**
|
||||
- 🚨 **CRITICAL:** No backup strategy evident!
|
||||
- ⚠️ Admin token in plain text
|
||||
- ⚠️ Signups open (verify intentional)
|
||||
- ⚠️ WebSocket disabled (reduces functionality)
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical security infrastructure
|
||||
2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
|
||||
3. Close signups after initial setup
|
||||
4. Rotate admin token and use secrets management
|
||||
5. Enable WebSocket for better sync
|
||||
6. Automate daily backups to off-site location
|
||||
|
||||
**Priority:** CRITICAL - Contains all your passwords!
|
||||
|
||||
---
|
||||
|
||||
## 🔴 Recently Stopped Services (Worth Investigating)
|
||||
|
||||
### 7. ollama ⚠️
|
||||
|
||||
**Status:** Exited (128) 4 minutes ago
|
||||
**Image:** ollama/ollama (3.33GB)
|
||||
**Purpose:** Local LLM inference engine
|
||||
|
||||
**Why It Matters:** open-webui depends on this!
|
||||
|
||||
**Recommendations:**
|
||||
1. 🔧 **RESTART** - Required for open-webui local models
|
||||
2. Investigate exit code 128 (configuration issue?)
|
||||
3. Configure GPU acceleration (RTX 4090!)
|
||||
4. Test with open-webui after restart
|
||||
|
||||
**Action:** `docker start ollama && docker logs -f ollama`
|
||||
|
||||
---
|
||||
|
||||
### 8. Monitoring Stack (Stopped 12 days ago) 🚨
|
||||
|
||||
**Containers:**
|
||||
- Grafana (stopped 12 days)
|
||||
- InfluxDB (stopped 12 days)
|
||||
- Telegraf (stopped 12 days)
|
||||
|
||||
**Total Size:** ~1.7GB
|
||||
|
||||
**Why Critical:** Zero observability into system health!
|
||||
|
||||
**Recommendations:**
|
||||
1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
|
||||
2. Configure dashboards for:
|
||||
- Docker container stats
|
||||
- System resources (CPU, RAM, disk)
|
||||
- Network traffic
|
||||
- Temperature sensors
|
||||
3. Set up alerting for critical issues
|
||||
4. Document in runbook
|
||||
|
||||
**Action:**
|
||||
```bash
|
||||
docker start Influxdb
|
||||
sleep 15 # Wait for DB initialization
|
||||
docker start Telegraf
|
||||
docker start Grafana
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 9. MariaDB (Stopped 12 days ago) ⚠️
|
||||
|
||||
**Status:** Exited (0) 12 days ago
|
||||
**Image:** lscr.io/linuxserver/mariadb (348MB)
|
||||
**Purpose:** MySQL database for Guacamole
|
||||
|
||||
**Issue:** Guacamole is running but database is stopped!
|
||||
|
||||
**Recommendations:**
|
||||
1. If using Guacamole: **RESTART**
|
||||
2. If not using Guacamole: **REMOVE BOTH**
|
||||
3. Document decision
|
||||
|
||||
---
|
||||
|
||||
### 10. Database Admin Tools (Stopped 12 days ago)
|
||||
|
||||
**CloudBeaver** - Stopped 12 days
|
||||
**adminer** - Stopped 12 days
|
||||
|
||||
**Issue:** Two database admin tools - redundant!
|
||||
|
||||
**Recommendations:**
|
||||
1. **CHOOSE ONE:**
|
||||
- CloudBeaver: Feature-rich (725MB)
|
||||
- adminer: Lightweight (118MB)
|
||||
2. Remove the other
|
||||
3. Only restart if you need database management
|
||||
|
||||
---
|
||||
|
||||
## 🟡 Experimental / Inactive Services (Decision Needed)
|
||||
|
||||
### 11. Nextcloud AIO Stack (7 containers!) 🚨
|
||||
|
||||
**Status:** All stopped 3 weeks ago
|
||||
**Total Size:** ~7GB Docker images + data
|
||||
**Containers:**
|
||||
- nextcloud-aio-mastercontainer
|
||||
- nextcloud-aio-apache
|
||||
- nextcloud-aio-nextcloud (2.19GB)
|
||||
- nextcloud-aio-database (PostgreSQL)
|
||||
- nextcloud-aio-redis
|
||||
- nextcloud-aio-onlyoffice (3.79GB!)
|
||||
- nextcloud-aio-imaginary
|
||||
- nextcloud-aio-notify-push
|
||||
|
||||
**Data:** /mnt/user/nextcloud (~1GB+)
|
||||
|
||||
**Analysis:**
|
||||
- Massive resource footprint
|
||||
- "All-in-One" = heavy coupling
|
||||
- Stopped for 3 weeks suggests not critical
|
||||
|
||||
**Recommendations:**
|
||||
**DECISION REQUIRED:**
|
||||
|
||||
**Option A: Remove Everything**
|
||||
```bash
|
||||
# Backup data first!
|
||||
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
|
||||
|
||||
# Remove containers
|
||||
docker rm nextcloud-aio-*
|
||||
|
||||
# Remove images to free space
|
||||
docker rmi $(docker images | grep nextcloud | awk '{print $3}')
|
||||
|
||||
# Archive data
|
||||
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
|
||||
```
|
||||
**Saves:** ~7GB+ space
|
||||
|
||||
**Option B: Keep and Restart**
|
||||
- Document why you need it
|
||||
- Create restart procedure
|
||||
- Implement backup strategy
|
||||
- Monitor resource usage
|
||||
|
||||
**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
|
||||
|
||||
---
|
||||
|
||||
### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
|
||||
|
||||
**Status:** Exited (0) 2 weeks ago
|
||||
**Image:** jellyfin/jellyfin (1.25GB)
|
||||
**GPU:** RTX 4090 allocated but idle!
|
||||
|
||||
**Media:**
|
||||
- Movies: /mnt/user/movies
|
||||
- TV: /mnt/user/tv shows
|
||||
- Music: /mnt/user/music
|
||||
|
||||
**Issue:** $1600 GPU sitting idle!
|
||||
|
||||
**Recommendations:**
|
||||
**If you want media server:**
|
||||
1. **RESTART** with hardware transcoding:
|
||||
```bash
|
||||
docker start Jellyfin
|
||||
```
|
||||
2. Configure NVENC/NVDEC for RTX 4090
|
||||
3. Test 4K transcoding performance
|
||||
4. Switch from `host` network to bridge (security)
|
||||
|
||||
**If you don't need media server:**
|
||||
1. Remove GPU allocation from container
|
||||
2. Free GPU for other projects (AI/ML)
|
||||
|
||||
**Action Required:** Decide on media server strategy
|
||||
|
||||
---
|
||||
|
||||
### 13. Large AI/ML Containers (Rarely Used)
|
||||
|
||||
**ebook2audiobook** - 20.06GB! (stopped 3 weeks)
|
||||
**docling-serve** - 14.45GB! (stopped 2 weeks)
|
||||
|
||||
**Total:** 34.5GB for two containers!
|
||||
|
||||
**Analysis:**
|
||||
- Massive images
|
||||
- Rarely used (stopped weeks ago)
|
||||
- Experimental/one-time use?
|
||||
|
||||
**Recommendations:**
|
||||
1. **REMOVE** both to free 34.5GB
|
||||
2. If needed again, pull fresh images
|
||||
3. Document use cases if keeping
|
||||
|
||||
**Potential Savings:** 34.5GB cache space!
|
||||
|
||||
---
|
||||
|
||||
### 14. Productivity Suite (Multiple Stopped)
|
||||
|
||||
**baserow** - Stopped 2 weeks (2.25GB)
|
||||
**NocoDB** - Stopped 3 weeks (588MB)
|
||||
**OpenProject** - Stopped 7 weeks (2.87GB)
|
||||
|
||||
**Issue:** Three project management tools - redundant!
|
||||
|
||||
**Recommendations:**
|
||||
1. **CHOOSE ONE** (or none if not used)
|
||||
2. Remove the others
|
||||
3. Migrate data if needed first
|
||||
|
||||
**Potential Savings:** ~5GB
|
||||
|
||||
---
|
||||
|
||||
### 15. Development Tools
|
||||
|
||||
**n8n** (workflow automation) - Created but never started
|
||||
**steam-headless** - Created but not running
|
||||
|
||||
**Recommendations:**
|
||||
- Document if you have plans for these
|
||||
- Remove if experimental and abandoned
|
||||
|
||||
---
|
||||
|
||||
## 📋 Container Decision Matrix
|
||||
|
||||
| Container | Keep? | Action | Priority |
|
||||
|-----------|-------|--------|----------|
|
||||
| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
|
||||
| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
|
||||
| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
|
||||
| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
|
||||
| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
|
||||
| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
|
||||
| **ollama** | ✅ Yes | Restart immediately | HIGH |
|
||||
| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
|
||||
| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
|
||||
| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
|
||||
| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
|
||||
| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
|
||||
| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
|
||||
| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
|
||||
| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Recommended Action Plan
|
||||
|
||||
### Phase 1: Critical (Do First!) 🚨
|
||||
|
||||
1. **Backup Vaultwarden** (30 min)
|
||||
```bash
|
||||
docker stop vaultwarden
|
||||
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
|
||||
docker start vaultwarden
|
||||
```
|
||||
|
||||
2. **Backup Gitea** (30 min)
|
||||
```bash
|
||||
docker stop Gitea
|
||||
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
|
||||
docker start Gitea
|
||||
```
|
||||
|
||||
3. **Restart Monitoring Stack** (15 min)
|
||||
```bash
|
||||
docker start Influxdb && sleep 15
|
||||
docker start Telegraf Grafana
|
||||
# Configure dashboards
|
||||
```
|
||||
|
||||
4. **Restart ollama** (5 min)
|
||||
```bash
|
||||
docker start ollama
|
||||
docker logs -f ollama
|
||||
```
|
||||
|
||||
### Phase 2: Cleanup (Free Space!) 💾
|
||||
|
||||
5. **Remove Large Unused Containers** (1 hour)
|
||||
- ebook2audiobook (20GB)
|
||||
- docling-serve (14.5GB)
|
||||
- Nextcloud AIO stack (7GB)
|
||||
- **Saves: ~41GB!**
|
||||
|
||||
6. **Docker System Cleanup**
|
||||
```bash
|
||||
docker system prune -a
|
||||
# Free unused images and build cache
|
||||
```
|
||||
|
||||
### Phase 3: Decisions (This Week)
|
||||
|
||||
7. **Guacamole + MariaDB** - Keep or remove?
|
||||
8. **Jellyfin** - Restart with GPU or remove?
|
||||
9. **Productivity tools** - Choose one, remove others
|
||||
10. **Database admin** - CloudBeaver or adminer?
|
||||
|
||||
---
|
||||
|
||||
## 📊 Storage Cleanup Impact
|
||||
|
||||
**Current Cache Usage:** 578GB / 932GB (63%)
|
||||
|
||||
**After Recommended Cleanup:**
|
||||
- Remove ebook2audiobook: -20GB
|
||||
- Remove docling-serve: -14.5GB
|
||||
- Remove Nextcloud AIO: -7GB
|
||||
- Docker system prune: ~10-20GB
|
||||
- **Total Freed: ~50-60GB**
|
||||
|
||||
**New Cache Usage:** ~520GB / 932GB (56%) ✅
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Security Recommendations
|
||||
|
||||
1. **Secrets Management** - Stop using plain text env vars
|
||||
2. **Close Open Signups** - Vaultwarden signups should be closed
|
||||
3. **SSH Port Conflict** - Fix Gitea port 22 conflict
|
||||
4. **Network Mode** - Move Jellyfin from `host` to `bridge`
|
||||
5. **Version Pinning** - Stop using `latest` tags
|
||||
|
||||
---
|
||||
|
||||
## 📈 Resource Summary
|
||||
|
||||
**Docker Images Total:** ~50GB
|
||||
**Container Data:** Varies by appdata
|
||||
**Cache Impact:** High (63% full)
|
||||
|
||||
**Top Resource Consumers (Images):**
|
||||
1. ebook2audiobook: 20.06GB
|
||||
2. docling-serve: 14.45GB
|
||||
3. Nextcloud stack: ~7GB
|
||||
4. open-webui: 4.55GB
|
||||
5. OpenProject: 2.87GB
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Key Takeaways
|
||||
|
||||
1. **6 services are your core** - Keep these running
|
||||
2. **26 stopped containers** - Cleanup opportunity
|
||||
3. **~40GB can be freed** - Significant space available
|
||||
4. **No monitoring** - Critical gap (restart Grafana stack!)
|
||||
5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** After cleanup actions completed
|
||||
**Maintained By:** Weston
|
||||
954
quick-start.md
Normal file
954
quick-start.md
Normal file
@@ -0,0 +1,954 @@
|
||||
# 🚀 Quick Start & Emergency Recovery Guide
|
||||
|
||||
**Purpose:** Get your homelab back online quickly after disaster
|
||||
**Target Time:** 30-60 minutes to basic functionality
|
||||
**Last Updated:** October 31, 2025
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Access Reference
|
||||
|
||||
### Essential URLs
|
||||
|
||||
| Service | URL | Default Credentials |
|
||||
|---------|-----|---------------------|
|
||||
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
|
||||
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
|
||||
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
|
||||
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
|
||||
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
|
||||
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
|
||||
|
||||
### SSH Access
|
||||
|
||||
```bash
|
||||
# Local network
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Via Tailscale (from anywhere)
|
||||
ssh root@100.122.220.126
|
||||
|
||||
# Emergency: Use PiKVM for console access
|
||||
# https://192.168.68.53
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Emergency Recovery Scenarios
|
||||
|
||||
### Scenario 1: Server Won't Boot 🚨
|
||||
|
||||
**Symptoms:**
|
||||
- No network connectivity to 192.168.68.51
|
||||
- Unraid WebUI unreachable
|
||||
- No response to ping
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Physical Check** (via PiKVM or in person)
|
||||
```
|
||||
[ ] Server has power (check LED)
|
||||
[ ] Network cable connected to eth0
|
||||
[ ] Monitor shows output (via PiKVM)
|
||||
[ ] USB boot drive is present and detected
|
||||
```
|
||||
|
||||
2. **Use PiKVM for Remote Console**
|
||||
- Access: https://192.168.68.53
|
||||
- Login: admin / admin
|
||||
- View boot process
|
||||
- Check BIOS/boot messages
|
||||
|
||||
3. **Common Boot Issues**
|
||||
|
||||
**USB Boot Drive Failure** (Most common!)
|
||||
```
|
||||
Symptoms: "Boot device not found" or similar
|
||||
|
||||
Fix:
|
||||
1. Have backup USB ready
|
||||
2. Shut down server (via PiKVM power control)
|
||||
3. Replace USB boot drive
|
||||
4. Power on
|
||||
5. Restore configuration from backup
|
||||
```
|
||||
|
||||
**BIOS Settings Changed**
|
||||
```
|
||||
Fix:
|
||||
1. Enter BIOS (DEL/F2 during boot)
|
||||
2. Load defaults
|
||||
3. Verify boot order (USB first)
|
||||
4. Save and exit
|
||||
```
|
||||
|
||||
**Hardware Failure**
|
||||
```
|
||||
Check:
|
||||
1. RAM seated properly
|
||||
2. All drives detected in BIOS
|
||||
3. CPU fan spinning
|
||||
4. No error beeps
|
||||
```
|
||||
|
||||
4. **Boot from Backup USB**
|
||||
```
|
||||
Steps:
|
||||
1. Power off server
|
||||
2. Insert backup USB boot drive
|
||||
3. Power on
|
||||
4. Verify boot successful
|
||||
5. Restore configuration:
|
||||
- Tools → Flash Backup → Browse → Select backup ZIP
|
||||
- Reboot
|
||||
```
|
||||
|
||||
**Prevention:**
|
||||
- ✅ Keep USB flash backup updated (weekly)
|
||||
- ✅ Store backup USB in safe location
|
||||
- ✅ Document BIOS settings (screenshots via PiKVM)
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2: Lost Admin Password
|
||||
|
||||
**Unraid Root Password Reset:**
|
||||
|
||||
1. **Via PiKVM Console**
|
||||
```
|
||||
1. Access PiKVM: https://192.168.68.53
|
||||
2. View console in browser
|
||||
3. Wait for login prompt
|
||||
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
|
||||
5. At terminal: passwd root
|
||||
6. Enter new password twice
|
||||
7. Press Ctrl+Alt+F1 to return to GUI
|
||||
8. Update documentation
|
||||
```
|
||||
|
||||
2. **Via Physical Access**
|
||||
```
|
||||
1. Connect monitor and keyboard to server
|
||||
2. Press Ctrl+Alt+F2
|
||||
3. Run: passwd root
|
||||
4. Set new password
|
||||
5. Press Ctrl+Alt+F1
|
||||
```
|
||||
|
||||
**Container Passwords:**
|
||||
- Check `/mnt/user/appdata/<service>/config`
|
||||
- Review environment variables in Docker templates
|
||||
- Use Vaultwarden if accessible
|
||||
- Check this documentation repo in Gitea
|
||||
|
||||
---
|
||||
|
||||
### Scenario 3: Container Won't Start
|
||||
|
||||
**Quick Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps -a | grep <container_name>
|
||||
|
||||
# View recent logs
|
||||
docker logs --tail 100 <container_name>
|
||||
|
||||
# Look for errors
|
||||
docker inspect <container_name> | grep -i error
|
||||
```
|
||||
|
||||
**Common Fixes:**
|
||||
|
||||
**Port Conflict:**
|
||||
```bash
|
||||
# Find what's using the port
|
||||
netstat -tulpn | grep <port>
|
||||
|
||||
# Example: Port 3000 already in use
|
||||
netstat -tulpn | grep 3000
|
||||
|
||||
# Stop conflicting service
|
||||
docker stop <conflicting_container>
|
||||
```
|
||||
|
||||
**Volume Permission Issues:**
|
||||
```bash
|
||||
# Check ownership
|
||||
ls -la /mnt/user/appdata/<container_name>
|
||||
|
||||
# Fix permissions (Unraid standard: 99:100)
|
||||
chown -R 99:100 /mnt/user/appdata/<container_name>
|
||||
|
||||
# Example: Fix Vaultwarden
|
||||
chown -R 99:100 /mnt/user/appdata/vaultwarden
|
||||
```
|
||||
|
||||
**Dependency Missing:**
|
||||
```bash
|
||||
# Example: Guacamole needs MariaDB
|
||||
docker start mariadb
|
||||
sleep 10 # Wait for database initialization
|
||||
docker start ApacheGuacamole
|
||||
|
||||
# Verify dependency is running
|
||||
docker ps | grep mariadb
|
||||
```
|
||||
|
||||
**Resource Exhaustion:**
|
||||
```bash
|
||||
# Check cache usage
|
||||
df -h /mnt/cache
|
||||
|
||||
# If cache full (>90%), clean up
|
||||
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
|
||||
|
||||
# Or free space manually
|
||||
# See service-inventory.md for cleanup recommendations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 4: Network Connectivity Issues
|
||||
|
||||
**Can't Access from LAN:**
|
||||
|
||||
```bash
|
||||
# SSH into Unraid (via PiKVM if network down)
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Check if br0 is up
|
||||
ip addr show br0
|
||||
# Should show: 192.168.68.51/22
|
||||
|
||||
# Verify IP and routes
|
||||
ip route | grep default
|
||||
# Should show: default via 192.168.68.1
|
||||
|
||||
# Test router connectivity
|
||||
ping -c 3 192.168.68.1
|
||||
|
||||
# Test internet
|
||||
ping -c 3 8.8.8.8
|
||||
|
||||
# Test DNS (Pi-hole)
|
||||
nslookup google.com 192.168.68.61
|
||||
```
|
||||
|
||||
**Fix Network Issues:**
|
||||
|
||||
```bash
|
||||
# Restart networking (from console/PiKVM)
|
||||
/etc/rc.d/rc.inet1 restart
|
||||
|
||||
# If that doesn't work, reboot
|
||||
reboot
|
||||
```
|
||||
|
||||
**Can't Access Containers:**
|
||||
|
||||
```bash
|
||||
# Check Docker network
|
||||
docker network inspect bridge
|
||||
|
||||
# Verify container IP
|
||||
docker inspect <container_name> | grep IPAddress
|
||||
|
||||
# Test from Unraid host
|
||||
curl http://172.17.0.5:8080 # Example: open-webui
|
||||
|
||||
# Test port mapping
|
||||
curl http://192.168.68.51:3000 # Should reach open-webui
|
||||
```
|
||||
|
||||
**DNS Not Resolving:**
|
||||
|
||||
```bash
|
||||
# Test Pi-hole directly
|
||||
nslookup google.com 192.168.68.61
|
||||
|
||||
# If Pi-hole down, check Pi Zero
|
||||
ping 192.168.68.61
|
||||
|
||||
# SSH to Pi-hole
|
||||
ssh pi@192.168.68.61
|
||||
|
||||
# Check Pi-hole status
|
||||
pihole status
|
||||
|
||||
# Restart if needed
|
||||
pihole restartdns
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 5: Array Won't Start
|
||||
|
||||
**Symptoms:**
|
||||
- Unraid GUI accessible but array shows "Stopped"
|
||||
- Disks show errors or missing
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
```bash
|
||||
# Check disk health
|
||||
smartctl -a /dev/sdb # Parity
|
||||
smartctl -a /dev/sdc # Disk 1
|
||||
|
||||
# View disk assignments
|
||||
cat /boot/config/disk.cfg
|
||||
|
||||
# Check for filesystem errors (read-only check)
|
||||
xfs_repair -n /dev/md1p1
|
||||
```
|
||||
|
||||
**Common Causes:**
|
||||
- Parity sync in progress (wait for completion)
|
||||
- Disk failed (check SMART, may need replacement)
|
||||
- Unclean shutdown (filesystem check required)
|
||||
- Disk assignment changed
|
||||
|
||||
**Recovery:**
|
||||
|
||||
1. **Start Array in Maintenance Mode**
|
||||
- Click "Start" in Unraid GUI
|
||||
- Select "Maintenance mode" if prompted
|
||||
- Run filesystem check if prompted
|
||||
|
||||
2. **Review Logs**
|
||||
- Settings → System Log
|
||||
- Look for disk errors
|
||||
- Check for power events
|
||||
|
||||
3. **If Disk Failed**
|
||||
- Follow Unraid disk replacement procedure
|
||||
- Do NOT format or write to disk unnecessarily
|
||||
- Seek help in Unraid forums if uncertain
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Critical Service Restart Procedures
|
||||
|
||||
### Restart Core Services (Proper Order)
|
||||
|
||||
**1. Infrastructure First:**
|
||||
```bash
|
||||
# Start reverse proxy (for routing)
|
||||
docker start NginxProxyManager
|
||||
|
||||
# Wait for it to be ready
|
||||
sleep 5
|
||||
docker ps | grep NginxProxyManager
|
||||
|
||||
# Start tunnel (for remote access)
|
||||
docker start Cloudflared
|
||||
|
||||
# Verify both running
|
||||
docker ps | grep -E "NginxProxyManager|Cloudflared"
|
||||
```
|
||||
|
||||
**2. Security Services:**
|
||||
```bash
|
||||
# Password manager (critical!)
|
||||
docker start vaultwarden
|
||||
|
||||
# Wait for healthy status
|
||||
sleep 10
|
||||
docker ps | grep vaultwarden
|
||||
# Should show "(healthy)"
|
||||
|
||||
# If not healthy, check logs
|
||||
docker logs --tail 50 vaultwarden
|
||||
```
|
||||
|
||||
**3. Development Tools:**
|
||||
```bash
|
||||
# Git server
|
||||
docker start Gitea
|
||||
|
||||
# Wait for initialization
|
||||
sleep 5
|
||||
|
||||
# Remote access gateway
|
||||
docker start ApacheGuacamole
|
||||
# Note: Needs MariaDB if configured
|
||||
```
|
||||
|
||||
**4. Monitoring (IMPORTANT!):**
|
||||
```bash
|
||||
# Database first
|
||||
docker start Influxdb
|
||||
|
||||
# Wait for DB to initialize
|
||||
sleep 15
|
||||
|
||||
# Then metrics collector
|
||||
docker start Telegraf
|
||||
|
||||
# Finally visualization
|
||||
docker start Grafana
|
||||
|
||||
# Verify all running
|
||||
docker ps | grep -E "Influxdb|Telegraf|Grafana"
|
||||
```
|
||||
|
||||
**5. Optional Services:**
|
||||
```bash
|
||||
# LLM backend
|
||||
docker start ollama
|
||||
sleep 10
|
||||
|
||||
# LLM interface
|
||||
docker start open-webui
|
||||
|
||||
# Wait for healthy
|
||||
docker ps | grep open-webui
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stop All Services Gracefully
|
||||
|
||||
```bash
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# Verify all stopped
|
||||
docker ps
|
||||
# Should show empty output
|
||||
|
||||
# Wait before stopping array
|
||||
sleep 5
|
||||
|
||||
# Stop array (from GUI)
|
||||
# Main → Array Operation → Stop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📦 Backup & Restore Procedures
|
||||
|
||||
### USB Flash Backup (Unraid Configuration)
|
||||
|
||||
**Create Backup:**
|
||||
1. Navigate to: **Main → Flash → Flash Backup**
|
||||
2. Click "Backup Now"
|
||||
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
|
||||
4. Store securely OFF-SERVER:
|
||||
- OneDrive: `/z_Unraid/Backups/`
|
||||
- External drive
|
||||
- Cloud storage
|
||||
|
||||
**Restore from Backup:**
|
||||
```
|
||||
1. Format new USB drive (if needed)
|
||||
2. Copy backup ZIP to new USB
|
||||
3. Extract contents to root of USB
|
||||
- config/ directory
|
||||
- bzimage, bzroot, etc.
|
||||
4. Safely eject USB
|
||||
5. Boot from new USB
|
||||
6. Configuration restored automatically
|
||||
```
|
||||
|
||||
**Frequency:**
|
||||
- Weekly minimum
|
||||
- After ANY configuration change
|
||||
- Before major updates
|
||||
|
||||
---
|
||||
|
||||
### Container Data Backup
|
||||
|
||||
**Critical Directories:**
|
||||
|
||||
```
|
||||
Priority 1 (CRITICAL):
|
||||
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
|
||||
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
|
||||
|
||||
Priority 2 (Important):
|
||||
/mnt/user/appdata/NginxProxyManager/ Proxy configs
|
||||
/mnt/user/appdata/Grafana/ Dashboards
|
||||
/mnt/user/appdata/Influxdb/ Metrics history
|
||||
|
||||
Priority 3 (Optional):
|
||||
/mnt/user/appdata/open-webui/ LLM chat history
|
||||
```
|
||||
|
||||
**Quick Backup Script:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/backup-critical.sh
|
||||
|
||||
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
echo "Stopping containers..."
|
||||
docker stop vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "Backing up data..."
|
||||
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
|
||||
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
|
||||
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
|
||||
|
||||
echo "Restarting containers..."
|
||||
docker start vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "✅ Backup complete: $BACKUP_DIR"
|
||||
ls -lh "$BACKUP_DIR"
|
||||
```
|
||||
|
||||
**Make Executable:**
|
||||
```bash
|
||||
chmod +x /mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Run Manually:**
|
||||
```bash
|
||||
/mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Schedule (User Scripts Plugin):**
|
||||
- Frequency: Daily at 2 AM
|
||||
- Retention: Keep last 30 days
|
||||
|
||||
---
|
||||
|
||||
**Restore from Backup:**
|
||||
|
||||
```bash
|
||||
# Example: Restore Vaultwarden
|
||||
docker stop vaultwarden
|
||||
|
||||
# Backup current (corrupted) data
|
||||
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
|
||||
|
||||
# Extract backup
|
||||
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
|
||||
|
||||
# Restart container
|
||||
docker start vaultwarden
|
||||
|
||||
# Verify working
|
||||
curl http://192.168.68.51:4743
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Quick Commands Reference
|
||||
|
||||
### System Status
|
||||
|
||||
```bash
|
||||
# System uptime and load
|
||||
uptime
|
||||
|
||||
# Resource usage
|
||||
free -h
|
||||
df -h
|
||||
|
||||
# Array status
|
||||
cat /proc/mdcmd
|
||||
|
||||
# Docker container summary
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
|
||||
# Temperature (if sensors installed)
|
||||
sensors
|
||||
|
||||
# Disk health quick check
|
||||
smartctl -H /dev/sdb # Parity
|
||||
smartctl -H /dev/sdc # Disk 1
|
||||
```
|
||||
|
||||
### Docker Quick Commands
|
||||
|
||||
```bash
|
||||
# Start all stopped containers
|
||||
docker start $(docker ps -aq)
|
||||
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# View logs (last 50 lines)
|
||||
docker logs --tail 50 <container_name>
|
||||
|
||||
# Follow logs in real-time
|
||||
docker logs -f <container_name>
|
||||
|
||||
# Restart container
|
||||
docker restart <container_name>
|
||||
|
||||
# Remove container (⚠️ will lose non-volume data!)
|
||||
docker rm <container_name>
|
||||
|
||||
# Clean up unused resources
|
||||
docker system prune # Safe cleanup
|
||||
docker system prune -a # ⚠️ Removes unused images too!
|
||||
docker system prune --volumes # ⚠️ Removes unused volumes!
|
||||
```
|
||||
|
||||
### Network Diagnostics
|
||||
|
||||
```bash
|
||||
# Check all interfaces
|
||||
ip addr show
|
||||
|
||||
# Test key infrastructure
|
||||
ping -c 3 192.168.68.1 # Router
|
||||
ping -c 3 192.168.68.51 # Unraid
|
||||
ping -c 3 192.168.68.61 # Pi-hole
|
||||
ping -c 3 8.8.8.8 # Internet
|
||||
|
||||
# DNS resolution test
|
||||
nslookup google.com
|
||||
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
|
||||
|
||||
# Check listening ports
|
||||
netstat -tulpn | grep LISTEN
|
||||
|
||||
# Test specific port
|
||||
nc -zv 192.168.68.51 3002 # Example: Gitea
|
||||
curl -I http://192.168.68.51:3002 # HTTP test
|
||||
```
|
||||
|
||||
### Quick Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/health-check.sh
|
||||
|
||||
echo "=== Unraid Health Check ==="
|
||||
echo ""
|
||||
|
||||
echo "1. Array Status:"
|
||||
cat /proc/mdcmd | grep mdState
|
||||
|
||||
echo ""
|
||||
echo "2. Running Containers:"
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||||
|
||||
echo ""
|
||||
echo "3. Disk Usage:"
|
||||
df -h | grep -E "cache|disk1|Filesystem"
|
||||
|
||||
echo ""
|
||||
echo "4. Network Connectivity:"
|
||||
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
|
||||
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
|
||||
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
|
||||
|
||||
echo ""
|
||||
echo "5. Critical Services:"
|
||||
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
|
||||
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
|
||||
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
|
||||
|
||||
echo ""
|
||||
echo "=== Health Check Complete ==="
|
||||
```
|
||||
|
||||
**Run:** `bash /mnt/user/scripts/health-check.sh`
|
||||
|
||||
---
|
||||
|
||||
## 📞 Getting Help
|
||||
|
||||
### Pre-flight Checks
|
||||
|
||||
Before asking for help, gather this information:
|
||||
|
||||
1. **System Diagnostics**
|
||||
- Unraid WebGUI: Tools → Diagnostics → Download
|
||||
- Creates ZIP with all logs
|
||||
|
||||
2. **Container Logs**
|
||||
```bash
|
||||
docker logs <container_name> > container-logs.txt
|
||||
```
|
||||
|
||||
3. **Network Configuration**
|
||||
```bash
|
||||
ip addr show > network-config.txt
|
||||
ip route show >> network-config.txt
|
||||
```
|
||||
|
||||
4. **Disk Status**
|
||||
```bash
|
||||
smartctl -a /dev/sdb > disk-smart.txt
|
||||
smartctl -a /dev/sdc >> disk-smart.txt
|
||||
```
|
||||
|
||||
### Community Resources
|
||||
|
||||
- **Unraid Forums:** https://forums.unraid.net/
|
||||
- Post diagnostics ZIP
|
||||
- Be specific about symptoms
|
||||
- Include what you've tried
|
||||
|
||||
- **r/unraid:** https://reddit.com/r/unraid
|
||||
- Quick questions
|
||||
- Share diagnostics in pastebin
|
||||
|
||||
- **Discord:** Unraid Official Discord
|
||||
- Real-time help
|
||||
- Active community
|
||||
|
||||
### Emergency Contacts
|
||||
|
||||
```
|
||||
ISP Support: [Your ISP Phone Number]
|
||||
Unraid License: [Store in secure location]
|
||||
USB Backup Location: [Document where stored]
|
||||
Off-site Backup: [If applicable]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Post-Recovery Checklist
|
||||
|
||||
After restoring from disaster:
|
||||
|
||||
```
|
||||
[ ] Unraid array started successfully
|
||||
[ ] All critical services running
|
||||
[ ] NginxProxyManager
|
||||
[ ] Cloudflared
|
||||
[ ] Vaultwarden
|
||||
[ ] Gitea
|
||||
[ ] Network connectivity verified
|
||||
[ ] Can access Unraid WebUI
|
||||
[ ] Can ping router (192.168.68.1)
|
||||
[ ] Internet working
|
||||
[ ] DNS resolving (Pi-hole)
|
||||
[ ] Vaultwarden accessible (test password retrieval)
|
||||
[ ] Gitea accessible (verify repositories intact)
|
||||
[ ] NPM routing working (test reverse proxy)
|
||||
[ ] Monitoring stack restarted
|
||||
[ ] Grafana
|
||||
[ ] InfluxDB
|
||||
[ ] Telegraf
|
||||
[ ] External access working
|
||||
[ ] Tailscale connected
|
||||
[ ] Cloudflare tunnel active
|
||||
[ ] Backups verified and up-to-date
|
||||
[ ] Documentation updated with lessons learned
|
||||
[ ] Incident documented in change log (Gitea)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security After Recovery
|
||||
|
||||
**Immediately After Disaster Recovery:**
|
||||
|
||||
1. **Change Passwords** (if compromise suspected)
|
||||
```
|
||||
[ ] Unraid root password
|
||||
[ ] Vaultwarden master password
|
||||
[ ] Container admin passwords
|
||||
[ ] Pi-hole admin password
|
||||
[ ] PiKVM password
|
||||
```
|
||||
|
||||
2. **Review Access Logs**
|
||||
```bash
|
||||
# Check SSH attempts
|
||||
grep "Failed password" /var/log/auth.log | tail -50
|
||||
|
||||
# Check NPM access
|
||||
docker logs NginxProxyManager | grep -i error
|
||||
|
||||
# Check Gitea access
|
||||
docker logs Gitea | grep -i login
|
||||
```
|
||||
|
||||
3. **Verify Firewall Rules**
|
||||
```bash
|
||||
iptables -L -n -v
|
||||
```
|
||||
|
||||
4. **Check for Unauthorized Changes**
|
||||
```bash
|
||||
# Review Docker containers
|
||||
docker ps -a
|
||||
|
||||
# Check cron jobs
|
||||
crontab -l
|
||||
|
||||
# Review network interfaces
|
||||
ip addr show
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Documentation Updates After Incident
|
||||
|
||||
**What to Document:**
|
||||
|
||||
1. **What Happened:**
|
||||
- Date/time of incident
|
||||
- Symptoms observed
|
||||
- Root cause (if determined)
|
||||
- Duration of outage
|
||||
|
||||
2. **What You Did:**
|
||||
- Steps taken to recover
|
||||
- What worked / didn't work
|
||||
- Resources used (forums, docs, etc.)
|
||||
- Time to recovery
|
||||
|
||||
3. **Lessons Learned:**
|
||||
- What could prevent this in future
|
||||
- Process improvements needed
|
||||
- Documentation gaps discovered
|
||||
- Backup improvements needed
|
||||
|
||||
4. **Action Items:**
|
||||
- Backups to implement/improve
|
||||
- Monitoring to add
|
||||
- Scripts to create
|
||||
- Hardware to replace/upgrade
|
||||
|
||||
**Where to Document:**
|
||||
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
|
||||
- Update this quick-start guide with new procedures
|
||||
- Add to troubleshooting section if recurring issue
|
||||
- Commit to Gitea with detailed message
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Normal Startup Sequence
|
||||
|
||||
**From Cold Boot:**
|
||||
|
||||
```
|
||||
1. Power on server
|
||||
↓
|
||||
2. BIOS POST (~30 seconds)
|
||||
- Hardware check
|
||||
- Memory test
|
||||
- Drive detection
|
||||
↓
|
||||
3. Unraid boots from USB (~1-2 minutes)
|
||||
- Linux kernel loads
|
||||
- Unraid OS starts
|
||||
↓
|
||||
4. Network initializes
|
||||
- br0 interface up
|
||||
- Gets IP: 192.168.68.51
|
||||
↓
|
||||
5. Array auto-starts (if configured)
|
||||
- Parity disk: sdb
|
||||
- Data disk: sdc
|
||||
- Cache: nvme1n1p1
|
||||
↓
|
||||
6. Docker service starts
|
||||
- docker0 bridge created
|
||||
- Networks initialized
|
||||
↓
|
||||
7. Containers auto-start (if enabled)
|
||||
- Infrastructure services first
|
||||
- Then application services
|
||||
↓
|
||||
8. Services available (~3-5 minutes total)
|
||||
✅ Ready to use!
|
||||
```
|
||||
|
||||
**Expected Boot Time:** 3-5 minutes
|
||||
**If Taking Longer:** Check system log for errors
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Health Check Command
|
||||
|
||||
**Run After Any Restart:**
|
||||
|
||||
```bash
|
||||
# Quick one-liner health check
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}" && \
|
||||
df -h | grep -E "cache|disk1" && \
|
||||
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **Network Issues:** See `network-map.md`
|
||||
- **Service Details:** See `service-inventory.md`
|
||||
- **Container Configs:** See `docker-compose/` (when created)
|
||||
- **Main Overview:** See `README.md`
|
||||
|
||||
---
|
||||
|
||||
## 🆘 True Emergency - Complete System Down
|
||||
|
||||
**If everything is down and you need immediate help:**
|
||||
|
||||
1. **Access via PiKVM**
|
||||
- https://192.168.68.53
|
||||
- Get console access
|
||||
- View what's happening
|
||||
|
||||
2. **Check Physical Server**
|
||||
- Power LED on?
|
||||
- Fans spinning?
|
||||
- Drives spinning up?
|
||||
- Network activity lights?
|
||||
|
||||
3. **Try Safe Mode Boot**
|
||||
- Boot Unraid in Safe Mode (GUI mode)
|
||||
- Diagnose from console
|
||||
|
||||
4. **Community Help**
|
||||
- Unraid Discord (fastest response)
|
||||
- Forums with diagnostics ZIP
|
||||
- r/unraid for quick questions
|
||||
|
||||
5. **Document Everything**
|
||||
- Take photos/screenshots via PiKVM
|
||||
- Note exact error messages
|
||||
- Record what you tried
|
||||
- Timeline of events
|
||||
|
||||
---
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
1. **Test Your Backups**
|
||||
- Restore test annually
|
||||
- Verify data integrity
|
||||
- Practice recovery procedures
|
||||
|
||||
2. **Keep This Guide Accessible**
|
||||
- Save offline copy to phone/laptop
|
||||
- Print critical sections
|
||||
- Bookmark in browser
|
||||
|
||||
3. **Automate Where Possible**
|
||||
- Schedule backup scripts
|
||||
- Set up monitoring alerts
|
||||
- Use User Scripts plugin
|
||||
|
||||
4. **Document As You Go**
|
||||
- Update after fixing issues
|
||||
- Add new procedures discovered
|
||||
- Note what worked/didn't work
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** Quarterly or after incidents
|
||||
**Maintained By:** Weston
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
|
||||
|
||||
**Keep this guide accessible even when the server is down!**
|
||||
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
|
||||
|
||||
🚀 **You've got this!**
|
||||
614
service-inventory.md
Normal file
614
service-inventory.md
Normal file
@@ -0,0 +1,614 @@
|
||||
# 📦 Service Inventory - Complete Container Catalog
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Total Containers:** 32 (6 running, 26 stopped)
|
||||
**Purpose:** Comprehensive catalog of all services
|
||||
|
||||
---
|
||||
|
||||
## 📊 Quick Stats
|
||||
|
||||
| Metric | Value | Status |
|
||||
|--------|-------|--------|
|
||||
| **Total Containers** | 32 | - |
|
||||
| **Running** | 6 | ✅ 19% |
|
||||
| **Stopped** | 26 | ⚠️ 81% |
|
||||
| **Total Docker Images** | ~50GB | ⚠️ High |
|
||||
| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
|
||||
|
||||
**Key Insight:** 81% of containers are stopped - cleanup opportunity!
|
||||
|
||||
---
|
||||
|
||||
## 🟢 Running Services (6 containers)
|
||||
|
||||
### 1. open-webui ⭐⭐⭐
|
||||
|
||||
**Status:** Running (healthy)
|
||||
**Container:** open-webui
|
||||
**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)
|
||||
**Created:** 2025-10-16 (2 weeks ago)
|
||||
**Network:** bridge (172.17.0.5)
|
||||
**Ports:** 8080 → 3000
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.15%
|
||||
- Memory: 1.026GB / 60.55GB (1.69%)
|
||||
- Storage: 42.4MB
|
||||
|
||||
**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
|
||||
|
||||
**Dependencies:**
|
||||
- ollama (currently STOPPED ❌)
|
||||
- OpenAI API key (configured)
|
||||
|
||||
**Access:**
|
||||
- Local: http://192.168.68.51:3000
|
||||
- No authentication by default
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ Depends on ollama container which is stopped
|
||||
- ⚠️ OpenAI API key exposed in environment variables
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Active LLM interface
|
||||
2. Restart ollama container to enable local models
|
||||
3. Move API keys to Docker secrets
|
||||
4. Enable authentication
|
||||
|
||||
**Priority:** HIGH - Core AI/ML service
|
||||
|
||||
---
|
||||
|
||||
### 2. NginxProxyManager ⭐⭐⭐
|
||||
|
||||
**Status:** Running
|
||||
**Container:** NginxProxyManager
|
||||
**Image:** jlesage/nginx-proxy-manager (189MB)
|
||||
**Created:** 2025-10-11 (3 weeks ago)
|
||||
**Network:** bridge (172.17.0.4)
|
||||
**Ports:** 4443→18443, 8080→1880, 8181→7818
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.08%
|
||||
- Memory: 77.45MB (0.12%)
|
||||
- Storage: 13.4KB
|
||||
|
||||
**Purpose:** Reverse proxy with web UI - SSL termination and routing
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Admin UI: http://192.168.68.51:7818
|
||||
- HTTP: http://192.168.68.51:1880
|
||||
- HTTPS: https://192.168.68.51:18443
|
||||
|
||||
**Configuration:**
|
||||
- Routes traffic to backend services
|
||||
- Manages SSL certificates
|
||||
- Provides access control
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical infrastructure
|
||||
2. Document all proxy rules in Gitea
|
||||
3. Verify SSL auto-renewal is configured
|
||||
4. Enable MFA if available
|
||||
5. Review access logs regularly
|
||||
|
||||
**Priority:** CRITICAL - Core infrastructure
|
||||
|
||||
---
|
||||
|
||||
### 3. Gitea ⭐⭐⭐
|
||||
|
||||
**Status:** Running
|
||||
**Container:** Gitea
|
||||
**Image:** gitea/gitea (180MB)
|
||||
**Created:** 2025-10-08 (3 weeks ago)
|
||||
**Network:** bridge (172.17.0.3)
|
||||
**Ports:** 22→22, 3000→3002
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.11%
|
||||
- Memory: 114.5MB (0.18%)
|
||||
- Storage: 113MB (active repositories!)
|
||||
|
||||
**Purpose:** Self-hosted Git server (GitHub alternative)
|
||||
|
||||
**Dependencies:** None (internal SQLite)
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:3002
|
||||
- Domain: https://gitea.segelschiff.app
|
||||
- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
|
||||
|
||||
**Configuration:**
|
||||
- Using latest tag (unpinned version)
|
||||
- Storage: /mnt/user/appdata/gitea
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ SSH port 22 conflicts with Unraid SSH
|
||||
- ⚠️ Using `latest` tag (version not pinned)
|
||||
- ⚠️ Backup strategy unknown
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical for version control
|
||||
2. Change SSH port to 2222 to avoid conflict
|
||||
3. Pin to specific version tag
|
||||
4. Implement automated backups (CRITICAL!)
|
||||
5. This is your version control hub - protect it!
|
||||
|
||||
**Priority:** CRITICAL - Infrastructure documentation depends on this
|
||||
|
||||
---
|
||||
|
||||
### 4. ApacheGuacamole ⭐⭐
|
||||
|
||||
**Status:** Running (2+ months uptime!)
|
||||
**Container:** ApacheGuacamole
|
||||
**Image:** jasonbean/guacamole (737MB)
|
||||
**Created:** 2025-08-22 (2+ months ago)
|
||||
**Network:** bridge (172.17.0.2)
|
||||
**Ports:** 8080→4000
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.16%
|
||||
- Memory: 785.8MB (1.27%)
|
||||
- Storage: 46.2MB
|
||||
|
||||
**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
|
||||
|
||||
**Dependencies:**
|
||||
- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:4000
|
||||
|
||||
**Configuration:**
|
||||
- MySQL enabled but MariaDB stopped
|
||||
- Multiple auth modules: MySQL, LDAP, TOTP, etc.
|
||||
|
||||
**Issues:**
|
||||
- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
|
||||
- Currently using embedded database (not recommended)
|
||||
- Data loss risk without proper database backend
|
||||
|
||||
**Recommendations:**
|
||||
1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
|
||||
2. If keeping: Start MariaDB and verify connection
|
||||
3. If not using: Stop Guacamole and remove both
|
||||
4. Document your use case for remote desktop access
|
||||
|
||||
**Priority:** MEDIUM - Fix dependency or remove
|
||||
|
||||
---
|
||||
|
||||
### 5. Cloudflared ⭐⭐⭐
|
||||
|
||||
**Status:** Running (2.5+ months - very stable!)
|
||||
**Container:** Unraid-Cloudflared-Tunnel
|
||||
**Image:** figro/unraid-cloudflared-tunnel (8.92MB)
|
||||
**Created:** 2025-08-10 (2.5+ months ago)
|
||||
**Network:** bridge (172.17.0.6)
|
||||
**Ports:** 46495→46495 (metrics)
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.33% (highest of running containers)
|
||||
- Memory: 68.6MB (0.11%)
|
||||
- Network I/O: 41.7MB RX / 310KB TX
|
||||
|
||||
**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Metrics: http://192.168.68.51:46495
|
||||
- Domain: *.segelschiff.app (managed via Cloudflare)
|
||||
|
||||
**Configuration:**
|
||||
- Tunnel token configured
|
||||
- No auto-update enabled
|
||||
- Metrics exposed for monitoring
|
||||
|
||||
**Security:**
|
||||
- ⚠️ Tunnel token in plain text environment variable
|
||||
- ✅ No open ports on router (excellent!)
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Excellent security practice
|
||||
2. Rotate tunnel token periodically
|
||||
3. Document which services are exposed
|
||||
4. Integrate metrics with monitoring stack
|
||||
|
||||
**Priority:** HIGH - Critical for secure remote access
|
||||
|
||||
---
|
||||
|
||||
### 6. Vaultwarden ⭐⭐⭐
|
||||
|
||||
**Status:** Running (healthy) - 3+ months uptime!
|
||||
**Container:** vaultwarden
|
||||
**Image:** vaultwarden/server (256MB)
|
||||
**Created:** 2025-07-31 (3+ months ago)
|
||||
**Network:** bridge (172.17.0.7)
|
||||
**Ports:** 80→4743
|
||||
|
||||
**Resources:**
|
||||
- CPU: 0.00% (idle)
|
||||
- Memory: 24.96MB (0.04%) - Very lightweight!
|
||||
|
||||
**Purpose:** Self-hosted password manager (Bitwarden compatible)
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Access:**
|
||||
- Web: http://192.168.68.51:4743
|
||||
- Admin: http://192.168.68.51:4743/admin
|
||||
|
||||
**Configuration:**
|
||||
- Signups allowed: true ⚠️
|
||||
- Invitations allowed: false ✅
|
||||
- WebSocket disabled ⚠️
|
||||
- Admin token exposed ⚠️
|
||||
|
||||
**Issues:**
|
||||
- 🚨 **CRITICAL:** No backup strategy evident!
|
||||
- ⚠️ Admin token in plain text
|
||||
- ⚠️ Signups open (verify intentional)
|
||||
- ⚠️ WebSocket disabled (reduces functionality)
|
||||
|
||||
**Recommendations:**
|
||||
1. ✅ **KEEP** - Critical security infrastructure
|
||||
2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
|
||||
3. Close signups after initial setup
|
||||
4. Rotate admin token and use secrets management
|
||||
5. Enable WebSocket for better sync
|
||||
6. Automate daily backups to off-site location
|
||||
|
||||
**Priority:** CRITICAL - Contains all your passwords!
|
||||
|
||||
---
|
||||
|
||||
## 🔴 Recently Stopped Services (Worth Investigating)
|
||||
|
||||
### 7. ollama ⚠️
|
||||
|
||||
**Status:** Exited (128) 4 minutes ago
|
||||
**Image:** ollama/ollama (3.33GB)
|
||||
**Purpose:** Local LLM inference engine
|
||||
|
||||
**Why It Matters:** open-webui depends on this!
|
||||
|
||||
**Recommendations:**
|
||||
1. 🔧 **RESTART** - Required for open-webui local models
|
||||
2. Investigate exit code 128 (configuration issue?)
|
||||
3. Configure GPU acceleration (RTX 4090!)
|
||||
4. Test with open-webui after restart
|
||||
|
||||
**Action:** `docker start ollama && docker logs -f ollama`
|
||||
|
||||
---
|
||||
|
||||
### 8. Monitoring Stack (Stopped 12 days ago) 🚨
|
||||
|
||||
**Containers:**
|
||||
- Grafana (stopped 12 days)
|
||||
- InfluxDB (stopped 12 days)
|
||||
- Telegraf (stopped 12 days)
|
||||
|
||||
**Total Size:** ~1.7GB
|
||||
|
||||
**Why Critical:** Zero observability into system health!
|
||||
|
||||
**Recommendations:**
|
||||
1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
|
||||
2. Configure dashboards for:
|
||||
- Docker container stats
|
||||
- System resources (CPU, RAM, disk)
|
||||
- Network traffic
|
||||
- Temperature sensors
|
||||
3. Set up alerting for critical issues
|
||||
4. Document in runbook
|
||||
|
||||
**Action:**
|
||||
```bash
|
||||
docker start Influxdb
|
||||
sleep 15 # Wait for DB initialization
|
||||
docker start Telegraf
|
||||
docker start Grafana
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 9. MariaDB (Stopped 12 days ago) ⚠️
|
||||
|
||||
**Status:** Exited (0) 12 days ago
|
||||
**Image:** lscr.io/linuxserver/mariadb (348MB)
|
||||
**Purpose:** MySQL database for Guacamole
|
||||
|
||||
**Issue:** Guacamole is running but database is stopped!
|
||||
|
||||
**Recommendations:**
|
||||
1. If using Guacamole: **RESTART**
|
||||
2. If not using Guacamole: **REMOVE BOTH**
|
||||
3. Document decision
|
||||
|
||||
---
|
||||
|
||||
### 10. Database Admin Tools (Stopped 12 days ago)
|
||||
|
||||
**CloudBeaver** - Stopped 12 days
|
||||
**adminer** - Stopped 12 days
|
||||
|
||||
**Issue:** Two database admin tools - redundant!
|
||||
|
||||
**Recommendations:**
|
||||
1. **CHOOSE ONE:**
|
||||
- CloudBeaver: Feature-rich (725MB)
|
||||
- adminer: Lightweight (118MB)
|
||||
2. Remove the other
|
||||
3. Only restart if you need database management
|
||||
|
||||
---
|
||||
|
||||
## 🟡 Experimental / Inactive Services (Decision Needed)
|
||||
|
||||
### 11. Nextcloud AIO Stack (7 containers!) 🚨
|
||||
|
||||
**Status:** All stopped 3 weeks ago
|
||||
**Total Size:** ~7GB Docker images + data
|
||||
**Containers:**
|
||||
- nextcloud-aio-mastercontainer
|
||||
- nextcloud-aio-apache
|
||||
- nextcloud-aio-nextcloud (2.19GB)
|
||||
- nextcloud-aio-database (PostgreSQL)
|
||||
- nextcloud-aio-redis
|
||||
- nextcloud-aio-onlyoffice (3.79GB!)
|
||||
- nextcloud-aio-imaginary
|
||||
- nextcloud-aio-notify-push
|
||||
|
||||
**Data:** /mnt/user/nextcloud (~1GB+)
|
||||
|
||||
**Analysis:**
|
||||
- Massive resource footprint
|
||||
- "All-in-One" = heavy coupling
|
||||
- Stopped for 3 weeks suggests not critical
|
||||
|
||||
**Recommendations:**
|
||||
**DECISION REQUIRED:**
|
||||
|
||||
**Option A: Remove Everything**
|
||||
```bash
|
||||
# Backup data first!
|
||||
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
|
||||
|
||||
# Remove containers
|
||||
docker rm nextcloud-aio-*
|
||||
|
||||
# Remove images to free space
|
||||
docker rmi $(docker images | grep nextcloud | awk '{print $3}')
|
||||
|
||||
# Archive data
|
||||
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
|
||||
```
|
||||
**Saves:** ~7GB+ space
|
||||
|
||||
**Option B: Keep and Restart**
|
||||
- Document why you need it
|
||||
- Create restart procedure
|
||||
- Implement backup strategy
|
||||
- Monitor resource usage
|
||||
|
||||
**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
|
||||
|
||||
---
|
||||
|
||||
### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
|
||||
|
||||
**Status:** Exited (0) 2 weeks ago
|
||||
**Image:** jellyfin/jellyfin (1.25GB)
|
||||
**GPU:** RTX 4090 allocated but idle!
|
||||
|
||||
**Media:**
|
||||
- Movies: /mnt/user/movies
|
||||
- TV: /mnt/user/tv shows
|
||||
- Music: /mnt/user/music
|
||||
|
||||
**Issue:** $1600 GPU sitting idle!
|
||||
|
||||
**Recommendations:**
|
||||
**If you want media server:**
|
||||
1. **RESTART** with hardware transcoding:
|
||||
```bash
|
||||
docker start Jellyfin
|
||||
```
|
||||
2. Configure NVENC/NVDEC for RTX 4090
|
||||
3. Test 4K transcoding performance
|
||||
4. Switch from `host` network to bridge (security)
|
||||
|
||||
**If you don't need media server:**
|
||||
1. Remove GPU allocation from container
|
||||
2. Free GPU for other projects (AI/ML)
|
||||
|
||||
**Action Required:** Decide on media server strategy
|
||||
|
||||
---
|
||||
|
||||
### 13. Large AI/ML Containers (Rarely Used)
|
||||
|
||||
**ebook2audiobook** - 20.06GB! (stopped 3 weeks)
|
||||
**docling-serve** - 14.45GB! (stopped 2 weeks)
|
||||
|
||||
**Total:** 34.5GB for two containers!
|
||||
|
||||
**Analysis:**
|
||||
- Massive images
|
||||
- Rarely used (stopped weeks ago)
|
||||
- Experimental/one-time use?
|
||||
|
||||
**Recommendations:**
|
||||
1. **REMOVE** both to free 34.5GB
|
||||
2. If needed again, pull fresh images
|
||||
3. Document use cases if keeping
|
||||
|
||||
**Potential Savings:** 34.5GB cache space!
|
||||
|
||||
---
|
||||
|
||||
### 14. Productivity Suite (Multiple Stopped)
|
||||
|
||||
**baserow** - Stopped 2 weeks (2.25GB)
|
||||
**NocoDB** - Stopped 3 weeks (588MB)
|
||||
**OpenProject** - Stopped 7 weeks (2.87GB)
|
||||
|
||||
**Issue:** Three project management tools - redundant!
|
||||
|
||||
**Recommendations:**
|
||||
1. **CHOOSE ONE** (or none if not used)
|
||||
2. Remove the others
|
||||
3. Migrate data if needed first
|
||||
|
||||
**Potential Savings:** ~5GB
|
||||
|
||||
---
|
||||
|
||||
### 15. Development Tools
|
||||
|
||||
**n8n** (workflow automation) - Created but never started
|
||||
**steam-headless** - Created but not running
|
||||
|
||||
**Recommendations:**
|
||||
- Document if you have plans for these
|
||||
- Remove if experimental and abandoned
|
||||
|
||||
---
|
||||
|
||||
## 📋 Container Decision Matrix
|
||||
|
||||
| Container | Keep? | Action | Priority |
|
||||
|-----------|-------|--------|----------|
|
||||
| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
|
||||
| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
|
||||
| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
|
||||
| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
|
||||
| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
|
||||
| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
|
||||
| **ollama** | ✅ Yes | Restart immediately | HIGH |
|
||||
| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
|
||||
| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
|
||||
| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
|
||||
| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
|
||||
| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
|
||||
| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
|
||||
| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
|
||||
| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Recommended Action Plan
|
||||
|
||||
### Phase 1: Critical (Do First!) 🚨
|
||||
|
||||
1. **Backup Vaultwarden** (30 min)
|
||||
```bash
|
||||
docker stop vaultwarden
|
||||
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
|
||||
docker start vaultwarden
|
||||
```
|
||||
|
||||
2. **Backup Gitea** (30 min)
|
||||
```bash
|
||||
docker stop Gitea
|
||||
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
|
||||
docker start Gitea
|
||||
```
|
||||
|
||||
3. **Restart Monitoring Stack** (15 min)
|
||||
```bash
|
||||
docker start Influxdb && sleep 15
|
||||
docker start Telegraf Grafana
|
||||
# Configure dashboards
|
||||
```
|
||||
|
||||
4. **Restart ollama** (5 min)
|
||||
```bash
|
||||
docker start ollama
|
||||
docker logs -f ollama
|
||||
```
|
||||
|
||||
### Phase 2: Cleanup (Free Space!) 💾
|
||||
|
||||
5. **Remove Large Unused Containers** (1 hour)
|
||||
- ebook2audiobook (20GB)
|
||||
- docling-serve (14.5GB)
|
||||
- Nextcloud AIO stack (7GB)
|
||||
- **Saves: ~41GB!**
|
||||
|
||||
6. **Docker System Cleanup**
|
||||
```bash
|
||||
docker system prune -a
|
||||
# Free unused images and build cache
|
||||
```
|
||||
|
||||
### Phase 3: Decisions (This Week)
|
||||
|
||||
7. **Guacamole + MariaDB** - Keep or remove?
|
||||
8. **Jellyfin** - Restart with GPU or remove?
|
||||
9. **Productivity tools** - Choose one, remove others
|
||||
10. **Database admin** - CloudBeaver or adminer?
|
||||
|
||||
---
|
||||
|
||||
## 📊 Storage Cleanup Impact
|
||||
|
||||
**Current Cache Usage:** 578GB / 932GB (63%)
|
||||
|
||||
**After Recommended Cleanup:**
|
||||
- Remove ebook2audiobook: -20GB
|
||||
- Remove docling-serve: -14.5GB
|
||||
- Remove Nextcloud AIO: -7GB
|
||||
- Docker system prune: ~10-20GB
|
||||
- **Total Freed: ~50-60GB**
|
||||
|
||||
**New Cache Usage:** ~520GB / 932GB (56%) ✅
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Security Recommendations
|
||||
|
||||
1. **Secrets Management** - Stop using plain text env vars
|
||||
2. **Close Open Signups** - Vaultwarden signups should be closed
|
||||
3. **SSH Port Conflict** - Fix Gitea port 22 conflict
|
||||
4. **Network Mode** - Move Jellyfin from `host` to `bridge`
|
||||
5. **Version Pinning** - Stop using `latest` tags
|
||||
|
||||
---
|
||||
|
||||
## 📈 Resource Summary
|
||||
|
||||
**Docker Images Total:** ~50GB
|
||||
**Container Data:** Varies by appdata
|
||||
**Cache Impact:** High (63% full)
|
||||
|
||||
**Top Resource Consumers (Images):**
|
||||
1. ebook2audiobook: 20.06GB
|
||||
2. docling-serve: 14.45GB
|
||||
3. Nextcloud stack: ~7GB
|
||||
4. open-webui: 4.55GB
|
||||
5. OpenProject: 2.87GB
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Key Takeaways
|
||||
|
||||
1. **6 services are your core** - Keep these running
|
||||
2. **26 stopped containers** - Cleanup opportunity
|
||||
3. **~40GB can be freed** - Significant space available
|
||||
4. **No monitoring** - Critical gap (restart Grafana stack!)
|
||||
5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** After cleanup actions completed
|
||||
**Maintained By:** Weston
|
||||
Reference in New Issue
Block a user