Phase 1 Complete: Foundation documentation
Added comprehensive homelab documentation:
README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap
docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands
docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan
docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions
This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
This commit is contained in:
954
docs/quick-start.md
Normal file
954
docs/quick-start.md
Normal file
@@ -0,0 +1,954 @@
|
||||
# 🚀 Quick Start & Emergency Recovery Guide
|
||||
|
||||
**Purpose:** Get your homelab back online quickly after disaster
|
||||
**Target Time:** 30-60 minutes to basic functionality
|
||||
**Last Updated:** October 31, 2025
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Access Reference
|
||||
|
||||
### Essential URLs
|
||||
|
||||
| Service | URL | Default Credentials |
|
||||
|---------|-----|---------------------|
|
||||
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
|
||||
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
|
||||
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
|
||||
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
|
||||
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
|
||||
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |
|
||||
|
||||
### SSH Access
|
||||
|
||||
```bash
|
||||
# Local network
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Via Tailscale (from anywhere)
|
||||
ssh root@100.122.220.126
|
||||
|
||||
# Emergency: Use PiKVM for console access
|
||||
# https://192.168.68.53
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Emergency Recovery Scenarios
|
||||
|
||||
### Scenario 1: Server Won't Boot 🚨
|
||||
|
||||
**Symptoms:**
|
||||
- No network connectivity to 192.168.68.51
|
||||
- Unraid WebUI unreachable
|
||||
- No response to ping
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Physical Check** (via PiKVM or in person)
|
||||
```
|
||||
[ ] Server has power (check LED)
|
||||
[ ] Network cable connected to eth0
|
||||
[ ] Monitor shows output (via PiKVM)
|
||||
[ ] USB boot drive is present and detected
|
||||
```
|
||||
|
||||
2. **Use PiKVM for Remote Console**
|
||||
- Access: https://192.168.68.53
|
||||
- Login: admin / admin
|
||||
- View boot process
|
||||
- Check BIOS/boot messages
|
||||
|
||||
3. **Common Boot Issues**
|
||||
|
||||
**USB Boot Drive Failure** (Most common!)
|
||||
```
|
||||
Symptoms: "Boot device not found" or similar
|
||||
|
||||
Fix:
|
||||
1. Have backup USB ready
|
||||
2. Shut down server (via PiKVM power control)
|
||||
3. Replace USB boot drive
|
||||
4. Power on
|
||||
5. Restore configuration from backup
|
||||
```
|
||||
|
||||
**BIOS Settings Changed**
|
||||
```
|
||||
Fix:
|
||||
1. Enter BIOS (DEL/F2 during boot)
|
||||
2. Load defaults
|
||||
3. Verify boot order (USB first)
|
||||
4. Save and exit
|
||||
```
|
||||
|
||||
**Hardware Failure**
|
||||
```
|
||||
Check:
|
||||
1. RAM seated properly
|
||||
2. All drives detected in BIOS
|
||||
3. CPU fan spinning
|
||||
4. No error beeps
|
||||
```
|
||||
|
||||
4. **Boot from Backup USB**
|
||||
```
|
||||
Steps:
|
||||
1. Power off server
|
||||
2. Insert backup USB boot drive
|
||||
3. Power on
|
||||
4. Verify boot successful
|
||||
5. Restore configuration:
|
||||
- Tools → Flash Backup → Browse → Select backup ZIP
|
||||
- Reboot
|
||||
```
|
||||
|
||||
**Prevention:**
|
||||
- ✅ Keep USB flash backup updated (weekly)
|
||||
- ✅ Store backup USB in safe location
|
||||
- ✅ Document BIOS settings (screenshots via PiKVM)
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2: Lost Admin Password
|
||||
|
||||
**Unraid Root Password Reset:**
|
||||
|
||||
1. **Via PiKVM Console**
|
||||
```
|
||||
1. Access PiKVM: https://192.168.68.53
|
||||
2. View console in browser
|
||||
3. Wait for login prompt
|
||||
4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
|
||||
5. At terminal: passwd root
|
||||
6. Enter new password twice
|
||||
7. Press Ctrl+Alt+F1 to return to GUI
|
||||
8. Update documentation
|
||||
```
|
||||
|
||||
2. **Via Physical Access**
|
||||
```
|
||||
1. Connect monitor and keyboard to server
|
||||
2. Press Ctrl+Alt+F2
|
||||
3. Run: passwd root
|
||||
4. Set new password
|
||||
5. Press Ctrl+Alt+F1
|
||||
```
|
||||
|
||||
**Container Passwords:**
|
||||
- Check `/mnt/user/appdata/<service>/config`
|
||||
- Review environment variables in Docker templates
|
||||
- Use Vaultwarden if accessible
|
||||
- Check this documentation repo in Gitea
|
||||
|
||||
---
|
||||
|
||||
### Scenario 3: Container Won't Start
|
||||
|
||||
**Quick Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps -a | grep <container_name>
|
||||
|
||||
# View recent logs
|
||||
docker logs --tail 100 <container_name>
|
||||
|
||||
# Look for errors
|
||||
docker inspect <container_name> | grep -i error
|
||||
```
|
||||
|
||||
**Common Fixes:**
|
||||
|
||||
**Port Conflict:**
|
||||
```bash
|
||||
# Find what's using the port
|
||||
netstat -tulpn | grep <port>
|
||||
|
||||
# Example: Port 3000 already in use
|
||||
netstat -tulpn | grep 3000
|
||||
|
||||
# Stop conflicting service
|
||||
docker stop <conflicting_container>
|
||||
```
|
||||
|
||||
**Volume Permission Issues:**
|
||||
```bash
|
||||
# Check ownership
|
||||
ls -la /mnt/user/appdata/<container_name>
|
||||
|
||||
# Fix permissions (Unraid standard: 99:100)
|
||||
chown -R 99:100 /mnt/user/appdata/<container_name>
|
||||
|
||||
# Example: Fix Vaultwarden
|
||||
chown -R 99:100 /mnt/user/appdata/vaultwarden
|
||||
```
|
||||
|
||||
**Dependency Missing:**
|
||||
```bash
|
||||
# Example: Guacamole needs MariaDB
|
||||
docker start mariadb
|
||||
sleep 10 # Wait for database initialization
|
||||
docker start ApacheGuacamole
|
||||
|
||||
# Verify dependency is running
|
||||
docker ps | grep mariadb
|
||||
```
|
||||
|
||||
**Resource Exhaustion:**
|
||||
```bash
|
||||
# Check cache usage
|
||||
df -h /mnt/cache
|
||||
|
||||
# If cache full (>90%), clean up
|
||||
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
|
||||
|
||||
# Or free space manually
|
||||
# See service-inventory.md for cleanup recommendations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 4: Network Connectivity Issues
|
||||
|
||||
**Can't Access from LAN:**
|
||||
|
||||
```bash
|
||||
# SSH into Unraid (via PiKVM if network down)
|
||||
ssh root@192.168.68.51
|
||||
|
||||
# Check if br0 is up
|
||||
ip addr show br0
|
||||
# Should show: 192.168.68.51/22
|
||||
|
||||
# Verify IP and routes
|
||||
ip route | grep default
|
||||
# Should show: default via 192.168.68.1
|
||||
|
||||
# Test router connectivity
|
||||
ping -c 3 192.168.68.1
|
||||
|
||||
# Test internet
|
||||
ping -c 3 8.8.8.8
|
||||
|
||||
# Test DNS (Pi-hole)
|
||||
nslookup google.com 192.168.68.61
|
||||
```
|
||||
|
||||
**Fix Network Issues:**
|
||||
|
||||
```bash
|
||||
# Restart networking (from console/PiKVM)
|
||||
/etc/rc.d/rc.inet1 restart
|
||||
|
||||
# If that doesn't work, reboot
|
||||
reboot
|
||||
```
|
||||
|
||||
**Can't Access Containers:**
|
||||
|
||||
```bash
|
||||
# Check Docker network
|
||||
docker network inspect bridge
|
||||
|
||||
# Verify container IP
|
||||
docker inspect <container_name> | grep IPAddress
|
||||
|
||||
# Test from Unraid host
|
||||
curl http://172.17.0.5:8080 # Example: open-webui
|
||||
|
||||
# Test port mapping
|
||||
curl http://192.168.68.51:3000 # Should reach open-webui
|
||||
```
|
||||
|
||||
**DNS Not Resolving:**
|
||||
|
||||
```bash
|
||||
# Test Pi-hole directly
|
||||
nslookup google.com 192.168.68.61
|
||||
|
||||
# If Pi-hole down, check Pi Zero
|
||||
ping 192.168.68.61
|
||||
|
||||
# SSH to Pi-hole
|
||||
ssh pi@192.168.68.61
|
||||
|
||||
# Check Pi-hole status
|
||||
pihole status
|
||||
|
||||
# Restart if needed
|
||||
pihole restartdns
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Scenario 5: Array Won't Start
|
||||
|
||||
**Symptoms:**
|
||||
- Unraid GUI accessible but array shows "Stopped"
|
||||
- Disks show errors or missing
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
```bash
|
||||
# Check disk health
|
||||
smartctl -a /dev/sdb # Parity
|
||||
smartctl -a /dev/sdc # Disk 1
|
||||
|
||||
# View disk assignments
|
||||
cat /boot/config/disk.cfg
|
||||
|
||||
# Check for filesystem errors (read-only check)
|
||||
xfs_repair -n /dev/md1p1
|
||||
```
|
||||
|
||||
**Common Causes:**
|
||||
- Parity sync in progress (wait for completion)
|
||||
- Disk failed (check SMART, may need replacement)
|
||||
- Unclean shutdown (filesystem check required)
|
||||
- Disk assignment changed
|
||||
|
||||
**Recovery:**
|
||||
|
||||
1. **Start Array in Maintenance Mode**
|
||||
- Click "Start" in Unraid GUI
|
||||
- Select "Maintenance mode" if prompted
|
||||
- Run filesystem check if prompted
|
||||
|
||||
2. **Review Logs**
|
||||
- Settings → System Log
|
||||
- Look for disk errors
|
||||
- Check for power events
|
||||
|
||||
3. **If Disk Failed**
|
||||
- Follow Unraid disk replacement procedure
|
||||
- Do NOT format or write to disk unnecessarily
|
||||
- Seek help in Unraid forums if uncertain
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Critical Service Restart Procedures
|
||||
|
||||
### Restart Core Services (Proper Order)
|
||||
|
||||
**1. Infrastructure First:**
|
||||
```bash
|
||||
# Start reverse proxy (for routing)
|
||||
docker start NginxProxyManager
|
||||
|
||||
# Wait for it to be ready
|
||||
sleep 5
|
||||
docker ps | grep NginxProxyManager
|
||||
|
||||
# Start tunnel (for remote access)
|
||||
docker start Cloudflared
|
||||
|
||||
# Verify both running
|
||||
docker ps | grep -E "NginxProxyManager|Cloudflared"
|
||||
```
|
||||
|
||||
**2. Security Services:**
|
||||
```bash
|
||||
# Password manager (critical!)
|
||||
docker start vaultwarden
|
||||
|
||||
# Wait for healthy status
|
||||
sleep 10
|
||||
docker ps | grep vaultwarden
|
||||
# Should show "(healthy)"
|
||||
|
||||
# If not healthy, check logs
|
||||
docker logs --tail 50 vaultwarden
|
||||
```
|
||||
|
||||
**3. Development Tools:**
|
||||
```bash
|
||||
# Git server
|
||||
docker start Gitea
|
||||
|
||||
# Wait for initialization
|
||||
sleep 5
|
||||
|
||||
# Remote access gateway
|
||||
docker start ApacheGuacamole
|
||||
# Note: Needs MariaDB if configured
|
||||
```
|
||||
|
||||
**4. Monitoring (IMPORTANT!):**
|
||||
```bash
|
||||
# Database first
|
||||
docker start Influxdb
|
||||
|
||||
# Wait for DB to initialize
|
||||
sleep 15
|
||||
|
||||
# Then metrics collector
|
||||
docker start Telegraf
|
||||
|
||||
# Finally visualization
|
||||
docker start Grafana
|
||||
|
||||
# Verify all running
|
||||
docker ps | grep -E "Influxdb|Telegraf|Grafana"
|
||||
```
|
||||
|
||||
**5. Optional Services:**
|
||||
```bash
|
||||
# LLM backend
|
||||
docker start ollama
|
||||
sleep 10
|
||||
|
||||
# LLM interface
|
||||
docker start open-webui
|
||||
|
||||
# Wait for healthy
|
||||
docker ps | grep open-webui
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Stop All Services Gracefully
|
||||
|
||||
```bash
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# Verify all stopped
|
||||
docker ps
|
||||
# Should show empty output
|
||||
|
||||
# Wait before stopping array
|
||||
sleep 5
|
||||
|
||||
# Stop array (from GUI)
|
||||
# Main → Array Operation → Stop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📦 Backup & Restore Procedures
|
||||
|
||||
### USB Flash Backup (Unraid Configuration)
|
||||
|
||||
**Create Backup:**
|
||||
1. Navigate to: **Main → Flash → Flash Backup**
|
||||
2. Click "Backup Now"
|
||||
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
|
||||
4. Store securely OFF-SERVER:
|
||||
- OneDrive: `/z_Unraid/Backups/`
|
||||
- External drive
|
||||
- Cloud storage
|
||||
|
||||
**Restore from Backup:**
|
||||
```
|
||||
1. Format new USB drive (if needed)
|
||||
2. Copy backup ZIP to new USB
|
||||
3. Extract contents to root of USB
|
||||
- config/ directory
|
||||
- bzimage, bzroot, etc.
|
||||
4. Safely eject USB
|
||||
5. Boot from new USB
|
||||
6. Configuration restored automatically
|
||||
```
|
||||
|
||||
**Frequency:**
|
||||
- Weekly minimum
|
||||
- After ANY configuration change
|
||||
- Before major updates
|
||||
|
||||
---
|
||||
|
||||
### Container Data Backup
|
||||
|
||||
**Critical Directories:**
|
||||
|
||||
```
|
||||
Priority 1 (CRITICAL):
|
||||
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
|
||||
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
|
||||
|
||||
Priority 2 (Important):
|
||||
/mnt/user/appdata/NginxProxyManager/ Proxy configs
|
||||
/mnt/user/appdata/Grafana/ Dashboards
|
||||
/mnt/user/appdata/Influxdb/ Metrics history
|
||||
|
||||
Priority 3 (Optional):
|
||||
/mnt/user/appdata/open-webui/ LLM chat history
|
||||
```
|
||||
|
||||
**Quick Backup Script:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/backup-critical.sh
|
||||
|
||||
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
echo "Stopping containers..."
|
||||
docker stop vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "Backing up data..."
|
||||
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
|
||||
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
|
||||
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
|
||||
|
||||
echo "Restarting containers..."
|
||||
docker start vaultwarden Gitea NginxProxyManager
|
||||
|
||||
echo "✅ Backup complete: $BACKUP_DIR"
|
||||
ls -lh "$BACKUP_DIR"
|
||||
```
|
||||
|
||||
**Make Executable:**
|
||||
```bash
|
||||
chmod +x /mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Run Manually:**
|
||||
```bash
|
||||
/mnt/user/scripts/backup-critical.sh
|
||||
```
|
||||
|
||||
**Schedule (User Scripts Plugin):**
|
||||
- Frequency: Daily at 2 AM
|
||||
- Retention: Keep last 30 days
|
||||
|
||||
---
|
||||
|
||||
**Restore from Backup:**
|
||||
|
||||
```bash
|
||||
# Example: Restore Vaultwarden
|
||||
docker stop vaultwarden
|
||||
|
||||
# Backup current (corrupted) data
|
||||
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
|
||||
|
||||
# Extract backup
|
||||
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
|
||||
|
||||
# Restart container
|
||||
docker start vaultwarden
|
||||
|
||||
# Verify working
|
||||
curl http://192.168.68.51:4743
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Quick Commands Reference
|
||||
|
||||
### System Status
|
||||
|
||||
```bash
|
||||
# System uptime and load
|
||||
uptime
|
||||
|
||||
# Resource usage
|
||||
free -h
|
||||
df -h
|
||||
|
||||
# Array status
|
||||
cat /proc/mdcmd
|
||||
|
||||
# Docker container summary
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
|
||||
# Temperature (if sensors installed)
|
||||
sensors
|
||||
|
||||
# Disk health quick check
|
||||
smartctl -H /dev/sdb # Parity
|
||||
smartctl -H /dev/sdc # Disk 1
|
||||
```
|
||||
|
||||
### Docker Quick Commands
|
||||
|
||||
```bash
|
||||
# Start all stopped containers
|
||||
docker start $(docker ps -aq)
|
||||
|
||||
# Stop all running containers
|
||||
docker stop $(docker ps -q)
|
||||
|
||||
# View logs (last 50 lines)
|
||||
docker logs --tail 50 <container_name>
|
||||
|
||||
# Follow logs in real-time
|
||||
docker logs -f <container_name>
|
||||
|
||||
# Restart container
|
||||
docker restart <container_name>
|
||||
|
||||
# Remove container (⚠️ will lose non-volume data!)
|
||||
docker rm <container_name>
|
||||
|
||||
# Clean up unused resources
|
||||
docker system prune # Safe cleanup
|
||||
docker system prune -a # ⚠️ Removes unused images too!
|
||||
docker system prune --volumes # ⚠️ Removes unused volumes!
|
||||
```
|
||||
|
||||
### Network Diagnostics
|
||||
|
||||
```bash
|
||||
# Check all interfaces
|
||||
ip addr show
|
||||
|
||||
# Test key infrastructure
|
||||
ping -c 3 192.168.68.1 # Router
|
||||
ping -c 3 192.168.68.51 # Unraid
|
||||
ping -c 3 192.168.68.61 # Pi-hole
|
||||
ping -c 3 8.8.8.8 # Internet
|
||||
|
||||
# DNS resolution test
|
||||
nslookup google.com
|
||||
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
|
||||
|
||||
# Check listening ports
|
||||
netstat -tulpn | grep LISTEN
|
||||
|
||||
# Test specific port
|
||||
nc -zv 192.168.68.51 3002 # Example: Gitea
|
||||
curl -I http://192.168.68.51:3002 # HTTP test
|
||||
```
|
||||
|
||||
### Quick Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Save as: /mnt/user/scripts/health-check.sh
|
||||
|
||||
echo "=== Unraid Health Check ==="
|
||||
echo ""
|
||||
|
||||
echo "1. Array Status:"
|
||||
cat /proc/mdcmd | grep mdState
|
||||
|
||||
echo ""
|
||||
echo "2. Running Containers:"
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||||
|
||||
echo ""
|
||||
echo "3. Disk Usage:"
|
||||
df -h | grep -E "cache|disk1|Filesystem"
|
||||
|
||||
echo ""
|
||||
echo "4. Network Connectivity:"
|
||||
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
|
||||
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
|
||||
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
|
||||
|
||||
echo ""
|
||||
echo "5. Critical Services:"
|
||||
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
|
||||
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
|
||||
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
|
||||
|
||||
echo ""
|
||||
echo "=== Health Check Complete ==="
|
||||
```
|
||||
|
||||
**Run:** `bash /mnt/user/scripts/health-check.sh`
|
||||
|
||||
---
|
||||
|
||||
## 📞 Getting Help
|
||||
|
||||
### Pre-flight Checks
|
||||
|
||||
Before asking for help, gather this information:
|
||||
|
||||
1. **System Diagnostics**
|
||||
- Unraid WebGUI: Tools → Diagnostics → Download
|
||||
- Creates ZIP with all logs
|
||||
|
||||
2. **Container Logs**
|
||||
```bash
|
||||
docker logs <container_name> > container-logs.txt
|
||||
```
|
||||
|
||||
3. **Network Configuration**
|
||||
```bash
|
||||
ip addr show > network-config.txt
|
||||
ip route show >> network-config.txt
|
||||
```
|
||||
|
||||
4. **Disk Status**
|
||||
```bash
|
||||
smartctl -a /dev/sdb > disk-smart.txt
|
||||
smartctl -a /dev/sdc >> disk-smart.txt
|
||||
```
|
||||
|
||||
### Community Resources
|
||||
|
||||
- **Unraid Forums:** https://forums.unraid.net/
|
||||
- Post diagnostics ZIP
|
||||
- Be specific about symptoms
|
||||
- Include what you've tried
|
||||
|
||||
- **r/unraid:** https://reddit.com/r/unraid
|
||||
- Quick questions
|
||||
- Share diagnostics in pastebin
|
||||
|
||||
- **Discord:** Unraid Official Discord
|
||||
- Real-time help
|
||||
- Active community
|
||||
|
||||
### Emergency Contacts
|
||||
|
||||
```
|
||||
ISP Support: [Your ISP Phone Number]
|
||||
Unraid License: [Store in secure location]
|
||||
USB Backup Location: [Document where stored]
|
||||
Off-site Backup: [If applicable]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Post-Recovery Checklist
|
||||
|
||||
After restoring from disaster:
|
||||
|
||||
```
|
||||
[ ] Unraid array started successfully
|
||||
[ ] All critical services running
|
||||
[ ] NginxProxyManager
|
||||
[ ] Cloudflared
|
||||
[ ] Vaultwarden
|
||||
[ ] Gitea
|
||||
[ ] Network connectivity verified
|
||||
[ ] Can access Unraid WebUI
|
||||
[ ] Can ping router (192.168.68.1)
|
||||
[ ] Internet working
|
||||
[ ] DNS resolving (Pi-hole)
|
||||
[ ] Vaultwarden accessible (test password retrieval)
|
||||
[ ] Gitea accessible (verify repositories intact)
|
||||
[ ] NPM routing working (test reverse proxy)
|
||||
[ ] Monitoring stack restarted
|
||||
[ ] Grafana
|
||||
[ ] InfluxDB
|
||||
[ ] Telegraf
|
||||
[ ] External access working
|
||||
[ ] Tailscale connected
|
||||
[ ] Cloudflare tunnel active
|
||||
[ ] Backups verified and up-to-date
|
||||
[ ] Documentation updated with lessons learned
|
||||
[ ] Incident documented in change log (Gitea)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security After Recovery
|
||||
|
||||
**Immediately After Disaster Recovery:**
|
||||
|
||||
1. **Change Passwords** (if compromise suspected)
|
||||
```
|
||||
[ ] Unraid root password
|
||||
[ ] Vaultwarden master password
|
||||
[ ] Container admin passwords
|
||||
[ ] Pi-hole admin password
|
||||
[ ] PiKVM password
|
||||
```
|
||||
|
||||
2. **Review Access Logs**
|
||||
```bash
|
||||
# Check SSH attempts
|
||||
grep "Failed password" /var/log/auth.log | tail -50
|
||||
|
||||
# Check NPM access
|
||||
docker logs NginxProxyManager | grep -i error
|
||||
|
||||
# Check Gitea access
|
||||
docker logs Gitea | grep -i login
|
||||
```
|
||||
|
||||
3. **Verify Firewall Rules**
|
||||
```bash
|
||||
iptables -L -n -v
|
||||
```
|
||||
|
||||
4. **Check for Unauthorized Changes**
|
||||
```bash
|
||||
# Review Docker containers
|
||||
docker ps -a
|
||||
|
||||
# Check cron jobs
|
||||
crontab -l
|
||||
|
||||
# Review network interfaces
|
||||
ip addr show
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Documentation Updates After Incident
|
||||
|
||||
**What to Document:**
|
||||
|
||||
1. **What Happened:**
|
||||
- Date/time of incident
|
||||
- Symptoms observed
|
||||
- Root cause (if determined)
|
||||
- Duration of outage
|
||||
|
||||
2. **What You Did:**
|
||||
- Steps taken to recover
|
||||
- What worked / didn't work
|
||||
- Resources used (forums, docs, etc.)
|
||||
- Time to recovery
|
||||
|
||||
3. **Lessons Learned:**
|
||||
- What could prevent this in future
|
||||
- Process improvements needed
|
||||
- Documentation gaps discovered
|
||||
- Backup improvements needed
|
||||
|
||||
4. **Action Items:**
|
||||
- Backups to implement/improve
|
||||
- Monitoring to add
|
||||
- Scripts to create
|
||||
- Hardware to replace/upgrade
|
||||
|
||||
**Where to Document:**
|
||||
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
|
||||
- Update this quick-start guide with new procedures
|
||||
- Add to troubleshooting section if recurring issue
|
||||
- Commit to Gitea with detailed message
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Normal Startup Sequence
|
||||
|
||||
**From Cold Boot:**
|
||||
|
||||
```
|
||||
1. Power on server
|
||||
↓
|
||||
2. BIOS POST (~30 seconds)
|
||||
- Hardware check
|
||||
- Memory test
|
||||
- Drive detection
|
||||
↓
|
||||
3. Unraid boots from USB (~1-2 minutes)
|
||||
- Linux kernel loads
|
||||
- Unraid OS starts
|
||||
↓
|
||||
4. Network initializes
|
||||
- br0 interface up
|
||||
- Gets IP: 192.168.68.51
|
||||
↓
|
||||
5. Array auto-starts (if configured)
|
||||
- Parity disk: sdb
|
||||
- Data disk: sdc
|
||||
- Cache: nvme1n1p1
|
||||
↓
|
||||
6. Docker service starts
|
||||
- docker0 bridge created
|
||||
- Networks initialized
|
||||
↓
|
||||
7. Containers auto-start (if enabled)
|
||||
- Infrastructure services first
|
||||
- Then application services
|
||||
↓
|
||||
8. Services available (~3-5 minutes total)
|
||||
✅ Ready to use!
|
||||
```
|
||||
|
||||
**Expected Boot Time:** 3-5 minutes
|
||||
**If Taking Longer:** Check system log for errors
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Quick Health Check Command
|
||||
|
||||
**Run After Any Restart:**
|
||||
|
||||
```bash
|
||||
# Quick one-liner health check
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}" && \
|
||||
df -h | grep -E "cache|disk1" && \
|
||||
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **Network Issues:** See `network-map.md`
|
||||
- **Service Details:** See `service-inventory.md`
|
||||
- **Container Configs:** See `docker-compose/` (when created)
|
||||
- **Main Overview:** See `README.md`
|
||||
|
||||
---
|
||||
|
||||
## 🆘 True Emergency - Complete System Down
|
||||
|
||||
**If everything is down and you need immediate help:**
|
||||
|
||||
1. **Access via PiKVM**
|
||||
- https://192.168.68.53
|
||||
- Get console access
|
||||
- View what's happening
|
||||
|
||||
2. **Check Physical Server**
|
||||
- Power LED on?
|
||||
- Fans spinning?
|
||||
- Drives spinning up?
|
||||
- Network activity lights?
|
||||
|
||||
3. **Try Safe Mode Boot**
|
||||
- Boot Unraid in Safe Mode (GUI mode)
|
||||
- Diagnose from console
|
||||
|
||||
4. **Community Help**
|
||||
- Unraid Discord (fastest response)
|
||||
- Forums with diagnostics ZIP
|
||||
- r/unraid for quick questions
|
||||
|
||||
5. **Document Everything**
|
||||
- Take photos/screenshots via PiKVM
|
||||
- Note exact error messages
|
||||
- Record what you tried
|
||||
- Timeline of events
|
||||
|
||||
---
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
1. **Test Your Backups**
|
||||
- Restore test annually
|
||||
- Verify data integrity
|
||||
- Practice recovery procedures
|
||||
|
||||
2. **Keep This Guide Accessible**
|
||||
- Save offline copy to phone/laptop
|
||||
- Print critical sections
|
||||
- Bookmark in browser
|
||||
|
||||
3. **Automate Where Possible**
|
||||
- Schedule backup scripts
|
||||
- Set up monitoring alerts
|
||||
- Use User Scripts plugin
|
||||
|
||||
4. **Document As You Go**
|
||||
- Update after fixing issues
|
||||
- Add new procedures discovered
|
||||
- Note what worked/didn't work
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** October 31, 2025
|
||||
**Next Review:** Quarterly or after incidents
|
||||
**Maintained By:** Weston
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
|
||||
|
||||
**Keep this guide accessible even when the server is down!**
|
||||
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!
|
||||
|
||||
🚀 **You've got this!**
|
||||
Reference in New Issue
Block a user