Added comprehensive homelab documentation:
README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap
docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands
docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan
docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions
This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
955 lines
20 KiB
Markdown
955 lines
20 KiB
Markdown
# 🚀 Quick Start & Emergency Recovery Guide
|
|
|
|
**Purpose:** Get your homelab back online quickly after disaster
|
|
**Target Time:** 30-60 minutes to basic functionality
|
|
**Last Updated:** October 31, 2025
|
|
|
|
---
|
|
|
|
## 🎯 Quick Access Reference
|
|
|
|
### Essential URLs
|
|
|
|
| Service | URL | Default Credentials |
|
|
|---------|-----|---------------------|
|
|
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
|
|
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
|
|
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
|
|
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
|
|
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
|
|
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
|
|
|
|
### SSH Access
|
|
|
|
```bash
|
|
# Local network
|
|
ssh root@192.168.68.51
|
|
|
|
# Via Tailscale (from anywhere)
|
|
ssh root@100.122.220.126
|
|
|
|
# Emergency: Use PiKVM for console access
|
|
# https://192.168.68.53
|
|
```
|
|
|
|
---
|
|
|
|
## 🆘 Emergency Recovery Scenarios
|
|
|
|
### Scenario 1: Server Won't Boot 🚨
|
|
|
|
**Symptoms:**
|
|
- No network connectivity to 192.168.68.51
|
|
- Unraid WebUI unreachable
|
|
- No response to ping
|
|
|
|
**Recovery Steps:**
|
|
|
|
1. **Physical Check** (via PiKVM or in person)
|
|
```
|
|
[ ] Server has power (check LED)
|
|
[ ] Network cable connected to eth0
|
|
[ ] Monitor shows output (via PiKVM)
|
|
[ ] USB boot drive is present and detected
|
|
```
|
|
|
|
2. **Use PiKVM for Remote Console**
|
|
- Access: https://192.168.68.53
|
|
- Login: admin / admin
|
|
- View boot process
|
|
- Check BIOS/boot messages
|
|
|
|
3. **Common Boot Issues**
|
|
|
|
**USB Boot Drive Failure** (Most common!)
|
|
```
|
|
Symptoms: "Boot device not found" or similar
|
|
|
|
Fix:
|
|
1. Have backup USB ready
|
|
2. Shut down server (via PiKVM power control)
|
|
3. Replace USB boot drive
|
|
4. Power on
|
|
5. Restore configuration from backup
|
|
```
|
|
|
|
**BIOS Settings Changed**
|
|
```
|
|
Fix:
|
|
1. Enter BIOS (DEL/F2 during boot)
|
|
2. Load defaults
|
|
3. Verify boot order (USB first)
|
|
4. Save and exit
|
|
```
|
|
|
|
**Hardware Failure**
|
|
```
|
|
Check:
|
|
1. RAM seated properly
|
|
2. All drives detected in BIOS
|
|
3. CPU fan spinning
|
|
4. No error beeps
|
|
```
|
|
|
|
4. **Boot from Backup USB**
|
|
```
|
|
Steps:
|
|
1. Power off server
|
|
2. Insert backup USB boot drive
|
|
3. Power on
|
|
4. Verify boot successful
|
|
5. Restore configuration:
|
|
- Tools → Flash Backup → Browse → Select backup ZIP
|
|
- Reboot
|
|
```
|
|
|
|
**Prevention:**
|
|
- ✅ Keep USB flash backup updated (weekly)
|
|
- ✅ Store backup USB in safe location
|
|
- ✅ Document BIOS settings (screenshots via PiKVM)
|
|
|
|
---
|
|
|
|
### Scenario 2: Lost Admin Password
|
|
|
|
**Unraid Root Password Reset:**
|
|
|
|
1. **Via PiKVM Console**
|
|
```
|
|
1. Access PiKVM: https://192.168.68.53
|
|
2. View console in browser
|
|
3. Wait for login prompt
|
|
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
|
|
5. At terminal: passwd root
|
|
6. Enter new password twice
|
|
7. Press Ctrl+Alt+F1 to return to GUI
|
|
8. Update documentation
|
|
```
|
|
|
|
2. **Via Physical Access**
|
|
```
|
|
1. Connect monitor and keyboard to server
|
|
2. Press Ctrl+Alt+F2
|
|
3. Run: passwd root
|
|
4. Set new password
|
|
5. Press Ctrl+Alt+F1
|
|
```
|
|
|
|
**Container Passwords:**
|
|
- Check `/mnt/user/appdata/<service>/config`
|
|
- Review environment variables in Docker templates
|
|
- Use Vaultwarden if accessible
|
|
- Check this documentation repo in Gitea
|
|
|
|
---
|
|
|
|
### Scenario 3: Container Won't Start
|
|
|
|
**Quick Diagnosis:**
|
|
|
|
```bash
|
|
# Check container status
|
|
docker ps -a | grep <container_name>
|
|
|
|
# View recent logs
|
|
docker logs --tail 100 <container_name>
|
|
|
|
# Look for errors
|
|
docker inspect <container_name> | grep -i error
|
|
```
|
|
|
|
**Common Fixes:**
|
|
|
|
**Port Conflict:**
|
|
```bash
|
|
# Find what's using the port
|
|
netstat -tulpn | grep <port>
|
|
|
|
# Example: Port 3000 already in use
|
|
netstat -tulpn | grep 3000
|
|
|
|
# Stop conflicting service
|
|
docker stop <conflicting_container>
|
|
```
|
|
|
|
**Volume Permission Issues:**
|
|
```bash
|
|
# Check ownership
|
|
ls -la /mnt/user/appdata/<container_name>
|
|
|
|
# Fix permissions (Unraid standard: 99:100)
|
|
chown -R 99:100 /mnt/user/appdata/<container_name>
|
|
|
|
# Example: Fix Vaultwarden
|
|
chown -R 99:100 /mnt/user/appdata/vaultwarden
|
|
```
|
|
|
|
**Dependency Missing:**
|
|
```bash
|
|
# Example: Guacamole needs MariaDB
|
|
docker start mariadb
|
|
sleep 10 # Wait for database initialization
|
|
docker start ApacheGuacamole
|
|
|
|
# Verify dependency is running
|
|
docker ps | grep mariadb
|
|
```
|
|
|
|
**Resource Exhaustion:**
|
|
```bash
|
|
# Check cache usage
|
|
df -h /mnt/cache
|
|
|
|
# If cache full (>90%), clean up
|
|
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
|
|
|
|
# Or free space manually
|
|
# See service-inventory.md for cleanup recommendations
|
|
```
|
|
|
|
---
|
|
|
|
### Scenario 4: Network Connectivity Issues
|
|
|
|
**Can't Access from LAN:**
|
|
|
|
```bash
|
|
# SSH into Unraid (via PiKVM if network down)
|
|
ssh root@192.168.68.51
|
|
|
|
# Check if br0 is up
|
|
ip addr show br0
|
|
# Should show: 192.168.68.51/22
|
|
|
|
# Verify IP and routes
|
|
ip route | grep default
|
|
# Should show: default via 192.168.68.1
|
|
|
|
# Test router connectivity
|
|
ping -c 3 192.168.68.1
|
|
|
|
# Test internet
|
|
ping -c 3 8.8.8.8
|
|
|
|
# Test DNS (Pi-hole)
|
|
nslookup google.com 192.168.68.61
|
|
```
|
|
|
|
**Fix Network Issues:**
|
|
|
|
```bash
|
|
# Restart networking (from console/PiKVM)
|
|
/etc/rc.d/rc.inet1 restart
|
|
|
|
# If that doesn't work, reboot
|
|
reboot
|
|
```
|
|
|
|
**Can't Access Containers:**
|
|
|
|
```bash
|
|
# Check Docker network
|
|
docker network inspect bridge
|
|
|
|
# Verify container IP
|
|
docker inspect <container_name> | grep IPAddress
|
|
|
|
# Test from Unraid host
|
|
curl http://172.17.0.5:8080 # Example: open-webui
|
|
|
|
# Test port mapping
|
|
curl http://192.168.68.51:3000 # Should reach open-webui
|
|
```
|
|
|
|
**DNS Not Resolving:**
|
|
|
|
```bash
|
|
# Test Pi-hole directly
|
|
nslookup google.com 192.168.68.61
|
|
|
|
# If Pi-hole down, check Pi Zero
|
|
ping 192.168.68.61
|
|
|
|
# SSH to Pi-hole
|
|
ssh pi@192.168.68.61
|
|
|
|
# Check Pi-hole status
|
|
pihole status
|
|
|
|
# Restart if needed
|
|
pihole restartdns
|
|
```
|
|
|
|
---
|
|
|
|
### Scenario 5: Array Won't Start
|
|
|
|
**Symptoms:**
|
|
- Unraid GUI accessible but array shows "Stopped"
|
|
- Disks show errors or missing
|
|
|
|
**Troubleshooting:**
|
|
|
|
```bash
|
|
# Check disk health
|
|
smartctl -a /dev/sdb # Parity
|
|
smartctl -a /dev/sdc # Disk 1
|
|
|
|
# View disk assignments
|
|
cat /boot/config/disk.cfg
|
|
|
|
# Check for filesystem errors (read-only check)
|
|
xfs_repair -n /dev/md1p1
|
|
```
|
|
|
|
**Common Causes:**
|
|
- Parity sync in progress (wait for completion)
|
|
- Disk failed (check SMART, may need replacement)
|
|
- Unclean shutdown (filesystem check required)
|
|
- Disk assignment changed
|
|
|
|
**Recovery:**
|
|
|
|
1. **Start Array in Maintenance Mode**
|
|
- Click "Start" in Unraid GUI
|
|
- Select "Maintenance mode" if prompted
|
|
- Run filesystem check if prompted
|
|
|
|
2. **Review Logs**
|
|
- Settings → System Log
|
|
- Look for disk errors
|
|
- Check for power events
|
|
|
|
3. **If Disk Failed**
|
|
- Follow Unraid disk replacement procedure
|
|
- Do NOT format or write to disk unnecessarily
|
|
- Seek help in Unraid forums if uncertain
|
|
|
|
---
|
|
|
|
## 🔧 Critical Service Restart Procedures
|
|
|
|
### Restart Core Services (Proper Order)
|
|
|
|
**1. Infrastructure First:**
|
|
```bash
|
|
# Start reverse proxy (for routing)
|
|
docker start NginxProxyManager
|
|
|
|
# Wait for it to be ready
|
|
sleep 5
|
|
docker ps | grep NginxProxyManager
|
|
|
|
# Start tunnel (for remote access)
|
|
docker start Cloudflared
|
|
|
|
# Verify both running
|
|
docker ps | grep -E "NginxProxyManager|Cloudflared"
|
|
```
|
|
|
|
**2. Security Services:**
|
|
```bash
|
|
# Password manager (critical!)
|
|
docker start vaultwarden
|
|
|
|
# Wait for healthy status
|
|
sleep 10
|
|
docker ps | grep vaultwarden
|
|
# Should show "(healthy)"
|
|
|
|
# If not healthy, check logs
|
|
docker logs --tail 50 vaultwarden
|
|
```
|
|
|
|
**3. Development Tools:**
|
|
```bash
|
|
# Git server
|
|
docker start Gitea
|
|
|
|
# Wait for initialization
|
|
sleep 5
|
|
|
|
# Remote access gateway
|
|
docker start ApacheGuacamole
|
|
# Note: Needs MariaDB if configured
|
|
```
|
|
|
|
**4. Monitoring (IMPORTANT!):**
|
|
```bash
|
|
# Database first
|
|
docker start Influxdb
|
|
|
|
# Wait for DB to initialize
|
|
sleep 15
|
|
|
|
# Then metrics collector
|
|
docker start Telegraf
|
|
|
|
# Finally visualization
|
|
docker start Grafana
|
|
|
|
# Verify all running
|
|
docker ps | grep -E "Influxdb|Telegraf|Grafana"
|
|
```
|
|
|
|
**5. Optional Services:**
|
|
```bash
|
|
# LLM backend
|
|
docker start ollama
|
|
sleep 10
|
|
|
|
# LLM interface
|
|
docker start open-webui
|
|
|
|
# Wait for healthy
|
|
docker ps | grep open-webui
|
|
```
|
|
|
|
---
|
|
|
|
### Stop All Services Gracefully
|
|
|
|
```bash
|
|
# Stop all running containers
|
|
docker stop $(docker ps -q)
|
|
|
|
# Verify all stopped
|
|
docker ps
|
|
# Should show empty output
|
|
|
|
# Wait before stopping array
|
|
sleep 5
|
|
|
|
# Stop array (from GUI)
|
|
# Main → Array Operation → Stop
|
|
```
|
|
|
|
---
|
|
|
|
## 📦 Backup & Restore Procedures
|
|
|
|
### USB Flash Backup (Unraid Configuration)
|
|
|
|
**Create Backup:**
|
|
1. Navigate to: **Main → Flash → Flash Backup**
|
|
2. Click "Backup Now"
|
|
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
|
|
4. Store securely OFF-SERVER:
|
|
- OneDrive: `/z_Unraid/Backups/`
|
|
- External drive
|
|
- Cloud storage
|
|
|
|
**Restore from Backup:**
|
|
```
|
|
1. Format new USB drive (if needed)
|
|
2. Copy backup ZIP to new USB
|
|
3. Extract contents to root of USB
|
|
- config/ directory
|
|
- bzimage, bzroot, etc.
|
|
4. Safely eject USB
|
|
5. Boot from new USB
|
|
6. Configuration restored automatically
|
|
```
|
|
|
|
**Frequency:**
|
|
- Weekly minimum
|
|
- After ANY configuration change
|
|
- Before major updates
|
|
|
|
---
|
|
|
|
### Container Data Backup
|
|
|
|
**Critical Directories:**
|
|
|
|
```
|
|
Priority 1 (CRITICAL):
|
|
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
|
|
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
|
|
|
|
Priority 2 (Important):
|
|
/mnt/user/appdata/NginxProxyManager/ Proxy configs
|
|
/mnt/user/appdata/Grafana/ Dashboards
|
|
/mnt/user/appdata/Influxdb/ Metrics history
|
|
|
|
Priority 3 (Optional):
|
|
/mnt/user/appdata/open-webui/ LLM chat history
|
|
```
|
|
|
|
**Quick Backup Script:**
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Save as: /mnt/user/scripts/backup-critical.sh
|
|
|
|
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
|
|
mkdir -p "$BACKUP_DIR"
|
|
|
|
echo "Stopping containers..."
|
|
docker stop vaultwarden Gitea NginxProxyManager
|
|
|
|
echo "Backing up data..."
|
|
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
|
|
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
|
|
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
|
|
|
|
echo "Restarting containers..."
|
|
docker start vaultwarden Gitea NginxProxyManager
|
|
|
|
echo "✅ Backup complete: $BACKUP_DIR"
|
|
ls -lh "$BACKUP_DIR"
|
|
```
|
|
|
|
**Make Executable:**
|
|
```bash
|
|
chmod +x /mnt/user/scripts/backup-critical.sh
|
|
```
|
|
|
|
**Run Manually:**
|
|
```bash
|
|
/mnt/user/scripts/backup-critical.sh
|
|
```
|
|
|
|
**Schedule (User Scripts Plugin):**
|
|
- Frequency: Daily at 2 AM
|
|
- Retention: Keep last 30 days
|
|
|
|
---
|
|
|
|
**Restore from Backup:**
|
|
|
|
```bash
|
|
# Example: Restore Vaultwarden
|
|
docker stop vaultwarden
|
|
|
|
# Backup current (corrupted) data
|
|
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
|
|
|
|
# Extract backup
|
|
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
|
|
|
|
# Restart container
|
|
docker start vaultwarden
|
|
|
|
# Verify working
|
|
curl http://192.168.68.51:4743
|
|
```
|
|
|
|
---
|
|
|
|
## ⚡ Quick Commands Reference
|
|
|
|
### System Status
|
|
|
|
```bash
|
|
# System uptime and load
|
|
uptime
|
|
|
|
# Resource usage
|
|
free -h
|
|
df -h
|
|
|
|
# Array status
|
|
cat /proc/mdcmd
|
|
|
|
# Docker container summary
|
|
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
|
|
|
# Temperature (if sensors installed)
|
|
sensors
|
|
|
|
# Disk health quick check
|
|
smartctl -H /dev/sdb # Parity
|
|
smartctl -H /dev/sdc # Disk 1
|
|
```
|
|
|
|
### Docker Quick Commands
|
|
|
|
```bash
|
|
# Start all stopped containers
|
|
docker start $(docker ps -aq)
|
|
|
|
# Stop all running containers
|
|
docker stop $(docker ps -q)
|
|
|
|
# View logs (last 50 lines)
|
|
docker logs --tail 50 <container_name>
|
|
|
|
# Follow logs in real-time
|
|
docker logs -f <container_name>
|
|
|
|
# Restart container
|
|
docker restart <container_name>
|
|
|
|
# Remove container (⚠️ will lose non-volume data!)
|
|
docker rm <container_name>
|
|
|
|
# Clean up unused resources
|
|
docker system prune # Safe cleanup
|
|
docker system prune -a # ⚠️ Removes unused images too!
|
|
docker system prune --volumes # ⚠️ Removes unused volumes!
|
|
```
|
|
|
|
### Network Diagnostics
|
|
|
|
```bash
|
|
# Check all interfaces
|
|
ip addr show
|
|
|
|
# Test key infrastructure
|
|
ping -c 3 192.168.68.1 # Router
|
|
ping -c 3 192.168.68.51 # Unraid
|
|
ping -c 3 192.168.68.61 # Pi-hole
|
|
ping -c 3 8.8.8.8 # Internet
|
|
|
|
# DNS resolution test
|
|
nslookup google.com
|
|
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
|
|
|
|
# Check listening ports
|
|
netstat -tulpn | grep LISTEN
|
|
|
|
# Test specific port
|
|
nc -zv 192.168.68.51 3002 # Example: Gitea
|
|
curl -I http://192.168.68.51:3002 # HTTP test
|
|
```
|
|
|
|
### Quick Health Check Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Save as: /mnt/user/scripts/health-check.sh
|
|
|
|
echo "=== Unraid Health Check ==="
|
|
echo ""
|
|
|
|
echo "1. Array Status:"
|
|
cat /proc/mdcmd | grep mdState
|
|
|
|
echo ""
|
|
echo "2. Running Containers:"
|
|
docker ps --format "table {{.Names}}\t{{.Status}}"
|
|
|
|
echo ""
|
|
echo "3. Disk Usage:"
|
|
df -h | grep -E "cache|disk1|Filesystem"
|
|
|
|
echo ""
|
|
echo "4. Network Connectivity:"
|
|
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
|
|
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
|
|
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
|
|
|
|
echo ""
|
|
echo "5. Critical Services:"
|
|
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
|
|
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
|
|
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
|
|
|
|
echo ""
|
|
echo "=== Health Check Complete ==="
|
|
```
|
|
|
|
**Run:** `bash /mnt/user/scripts/health-check.sh`
|
|
|
|
---
|
|
|
|
## 📞 Getting Help
|
|
|
|
### Pre-flight Checks
|
|
|
|
Before asking for help, gather this information:
|
|
|
|
1. **System Diagnostics**
|
|
- Unraid WebGUI: Tools → Diagnostics → Download
|
|
- Creates ZIP with all logs
|
|
|
|
2. **Container Logs**
|
|
```bash
|
|
docker logs <container_name> > container-logs.txt
|
|
```
|
|
|
|
3. **Network Configuration**
|
|
```bash
|
|
ip addr show > network-config.txt
|
|
ip route show >> network-config.txt
|
|
```
|
|
|
|
4. **Disk Status**
|
|
```bash
|
|
smartctl -a /dev/sdb > disk-smart.txt
|
|
smartctl -a /dev/sdc >> disk-smart.txt
|
|
```
|
|
|
|
### Community Resources
|
|
|
|
- **Unraid Forums:** https://forums.unraid.net/
|
|
- Post diagnostics ZIP
|
|
- Be specific about symptoms
|
|
- Include what you've tried
|
|
|
|
- **r/unraid:** https://reddit.com/r/unraid
|
|
- Quick questions
|
|
- Share diagnostics in pastebin
|
|
|
|
- **Discord:** Unraid Official Discord
|
|
- Real-time help
|
|
- Active community
|
|
|
|
### Emergency Contacts
|
|
|
|
```
|
|
ISP Support: [Your ISP Phone Number]
|
|
Unraid License: [Store in secure location]
|
|
USB Backup Location: [Document where stored]
|
|
Off-site Backup: [If applicable]
|
|
```
|
|
|
|
---
|
|
|
|
## 🎓 Post-Recovery Checklist
|
|
|
|
After restoring from disaster:
|
|
|
|
```
|
|
[ ] Unraid array started successfully
|
|
[ ] All critical services running
|
|
[ ] NginxProxyManager
|
|
[ ] Cloudflared
|
|
[ ] Vaultwarden
|
|
[ ] Gitea
|
|
[ ] Network connectivity verified
|
|
[ ] Can access Unraid WebUI
|
|
[ ] Can ping router (192.168.68.1)
|
|
[ ] Internet working
|
|
[ ] DNS resolving (Pi-hole)
|
|
[ ] Vaultwarden accessible (test password retrieval)
|
|
[ ] Gitea accessible (verify repositories intact)
|
|
[ ] NPM routing working (test reverse proxy)
|
|
[ ] Monitoring stack restarted
|
|
[ ] Grafana
|
|
[ ] InfluxDB
|
|
[ ] Telegraf
|
|
[ ] External access working
|
|
[ ] Tailscale connected
|
|
[ ] Cloudflare tunnel active
|
|
[ ] Backups verified and up-to-date
|
|
[ ] Documentation updated with lessons learned
|
|
[ ] Incident documented in change log (Gitea)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔒 Security After Recovery
|
|
|
|
**Immediately After Disaster Recovery:**
|
|
|
|
1. **Change Passwords** (if compromise suspected)
|
|
```
|
|
[ ] Unraid root password
|
|
[ ] Vaultwarden master password
|
|
[ ] Container admin passwords
|
|
[ ] Pi-hole admin password
|
|
[ ] PiKVM password
|
|
```
|
|
|
|
2. **Review Access Logs**
|
|
```bash
|
|
# Check SSH attempts
|
|
grep "Failed password" /var/log/auth.log | tail -50
|
|
|
|
# Check NPM access
|
|
docker logs NginxProxyManager | grep -i error
|
|
|
|
# Check Gitea access
|
|
docker logs Gitea | grep -i login
|
|
```
|
|
|
|
3. **Verify Firewall Rules**
|
|
```bash
|
|
iptables -L -n -v
|
|
```
|
|
|
|
4. **Check for Unauthorized Changes**
|
|
```bash
|
|
# Review Docker containers
|
|
docker ps -a
|
|
|
|
# Check cron jobs
|
|
crontab -l
|
|
|
|
# Review network interfaces
|
|
ip addr show
|
|
```
|
|
|
|
---
|
|
|
|
## 📝 Documentation Updates After Incident
|
|
|
|
**What to Document:**
|
|
|
|
1. **What Happened:**
|
|
- Date/time of incident
|
|
- Symptoms observed
|
|
- Root cause (if determined)
|
|
- Duration of outage
|
|
|
|
2. **What You Did:**
|
|
- Steps taken to recover
|
|
- What worked / didn't work
|
|
- Resources used (forums, docs, etc.)
|
|
- Time to recovery
|
|
|
|
3. **Lessons Learned:**
|
|
- What could prevent this in future
|
|
- Process improvements needed
|
|
- Documentation gaps discovered
|
|
- Backup improvements needed
|
|
|
|
4. **Action Items:**
|
|
- Backups to implement/improve
|
|
- Monitoring to add
|
|
- Scripts to create
|
|
- Hardware to replace/upgrade
|
|
|
|
**Where to Document:**
|
|
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
|
|
- Update this quick-start guide with new procedures
|
|
- Add to troubleshooting section if recurring issue
|
|
- Commit to Gitea with detailed message
|
|
|
|
---
|
|
|
|
## 🚀 Normal Startup Sequence
|
|
|
|
**From Cold Boot:**
|
|
|
|
```
|
|
1. Power on server
|
|
↓
|
|
2. BIOS POST (~30 seconds)
|
|
- Hardware check
|
|
- Memory test
|
|
- Drive detection
|
|
↓
|
|
3. Unraid boots from USB (~1-2 minutes)
|
|
- Linux kernel loads
|
|
- Unraid OS starts
|
|
↓
|
|
4. Network initializes
|
|
- br0 interface up
|
|
- Gets IP: 192.168.68.51
|
|
↓
|
|
5. Array auto-starts (if configured)
|
|
- Parity disk: sdb
|
|
- Data disk: sdc
|
|
- Cache: nvme1n1p1
|
|
↓
|
|
6. Docker service starts
|
|
- docker0 bridge created
|
|
- Networks initialized
|
|
↓
|
|
7. Containers auto-start (if enabled)
|
|
- Infrastructure services first
|
|
- Then application services
|
|
↓
|
|
8. Services available (~3-5 minutes total)
|
|
✅ Ready to use!
|
|
```
|
|
|
|
**Expected Boot Time:** 3-5 minutes
|
|
**If Taking Longer:** Check system log for errors
|
|
|
|
---
|
|
|
|
## 🎯 Quick Health Check Command
|
|
|
|
**Run After Any Restart:**
|
|
|
|
```bash
|
|
# Quick one-liner health check
|
|
docker ps --format "table {{.Names}}\t{{.Status}}" && \
|
|
df -h | grep -E "cache|disk1" && \
|
|
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
|
|
```
|
|
|
|
---
|
|
|
|
## 📚 Related Documentation
|
|
|
|
- **Network Issues:** See `network-map.md`
|
|
- **Service Details:** See `service-inventory.md`
|
|
- **Container Configs:** See `docker-compose/` (when created)
|
|
- **Main Overview:** See `README.md`
|
|
|
|
---
|
|
|
|
## 🆘 True Emergency - Complete System Down
|
|
|
|
**If everything is down and you need immediate help:**
|
|
|
|
1. **Access via PiKVM**
|
|
- https://192.168.68.53
|
|
- Get console access
|
|
- View what's happening
|
|
|
|
2. **Check Physical Server**
|
|
- Power LED on?
|
|
- Fans spinning?
|
|
- Drives spinning up?
|
|
- Network activity lights?
|
|
|
|
3. **Try Safe Mode Boot**
|
|
- Boot Unraid in Safe Mode (GUI mode)
|
|
- Diagnose from console
|
|
|
|
4. **Community Help**
|
|
- Unraid Discord (fastest response)
|
|
- Forums with diagnostics ZIP
|
|
- r/unraid for quick questions
|
|
|
|
5. **Document Everything**
|
|
- Take photos/screenshots via PiKVM
|
|
- Note exact error messages
|
|
- Record what you tried
|
|
- Timeline of events
|
|
|
|
---
|
|
|
|
## 💡 Pro Tips
|
|
|
|
1. **Test Your Backups**
|
|
- Restore test annually
|
|
- Verify data integrity
|
|
- Practice recovery procedures
|
|
|
|
2. **Keep This Guide Accessible**
|
|
- Save offline copy to phone/laptop
|
|
- Print critical sections
|
|
- Bookmark in browser
|
|
|
|
3. **Automate Where Possible**
|
|
- Schedule backup scripts
|
|
- Set up monitoring alerts
|
|
- Use User Scripts plugin
|
|
|
|
4. **Document As You Go**
|
|
- Update after fixing issues
|
|
- Add new procedures discovered
|
|
- Note what worked/didn't work
|
|
|
|
---
|
|
|
|
**Last Updated:** October 31, 2025
|
|
**Next Review:** Quarterly or after incidents
|
|
**Maintained By:** Weston
|
|
|
|
---
|
|
|
|
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
|
|
|
|
**Keep this guide accessible even when the server is down!**
|
|
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
|
|
|
|
🚀 **You've got this!**
|