Added comprehensive homelab documentation:
README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap
docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands
docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan
docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions
This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
20 KiB
🚀 Quick Start & Emergency Recovery Guide
Purpose: Get your homelab back online quickly after disaster
Target Time: 30-60 minutes to basic functionality
Last Updated: October 31, 2025
🎯 Quick Access Reference
Essential URLs
| Service | URL | Default Credentials |
|---|---|---|
| Unraid Dashboard | http://192.168.68.51 | root / (your password) |
| Gitea | https://gitea.segelschiff.app | Weston / (your password) |
| Vaultwarden | http://192.168.68.51:4743 | Master password |
| NPM Admin | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
| Pi-hole | http://192.168.68.61/admin | (your password) |
| PiKVM | https://192.168.68.53 | admin / admin (default) |
SSH Access
# Local network
ssh root@192.168.68.51
# Via Tailscale (from anywhere)
ssh root@100.122.220.126
# Emergency: Use PiKVM for console access
# https://192.168.68.53
🆘 Emergency Recovery Scenarios
Scenario 1: Server Won't Boot 🚨
Symptoms:
- No network connectivity to 192.168.68.51
- Unraid WebUI unreachable
- No response to ping
Recovery Steps:
-
Physical Check (via PiKVM or in person)
[ ] Server has power (check LED) [ ] Network cable connected to eth0 [ ] Monitor shows output (via PiKVM) [ ] USB boot drive is present and detected -
Use PiKVM for Remote Console
- Access: https://192.168.68.53
- Login: admin / admin
- View boot process
- Check BIOS/boot messages
-
Common Boot Issues
USB Boot Drive Failure (Most common!)
Symptoms: "Boot device not found" or similar Fix: 1. Have backup USB ready 2. Shut down server (via PiKVM power control) 3. Replace USB boot drive 4. Power on 5. Restore configuration from backupBIOS Settings Changed
Fix: 1. Enter BIOS (DEL/F2 during boot) 2. Load defaults 3. Verify boot order (USB first) 4. Save and exitHardware Failure
Check: 1. RAM seated properly 2. All drives detected in BIOS 3. CPU fan spinning 4. No error beeps -
Boot from Backup USB
Steps: 1. Power off server 2. Insert backup USB boot drive 3. Power on 4. Verify boot successful 5. Restore configuration: - Tools → Flash Backup → Browse → Select backup ZIP - Reboot
Prevention:
- ✅ Keep USB flash backup updated (weekly)
- ✅ Store backup USB in safe location
- ✅ Document BIOS settings (screenshots via PiKVM)
Scenario 2: Lost Admin Password
Unraid Root Password Reset:
-
Via PiKVM Console
1. Access PiKVM: https://192.168.68.53 2. View console in browser 3. Wait for login prompt 4. Press Ctrl+Alt+F2 (via PiKVM keyboard) 5. At terminal: passwd root 6. Enter new password twice 7. Press Ctrl+Alt+F1 to return to GUI 8. Update documentation -
Via Physical Access
1. Connect monitor and keyboard to server 2. Press Ctrl+Alt+F2 3. Run: passwd root 4. Set new password 5. Press Ctrl+Alt+F1
Container Passwords:
- Check
/mnt/user/appdata/<service>/config - Review environment variables in Docker templates
- Use Vaultwarden if accessible
- Check this documentation repo in Gitea
Scenario 3: Container Won't Start
Quick Diagnosis:
# Check container status
docker ps -a | grep <container_name>
# View recent logs
docker logs --tail 100 <container_name>
# Look for errors
docker inspect <container_name> | grep -i error
Common Fixes:
Port Conflict:
# Find what's using the port
netstat -tulpn | grep <port>
# Example: Port 3000 already in use
netstat -tulpn | grep 3000
# Stop conflicting service
docker stop <conflicting_container>
Volume Permission Issues:
# Check ownership
ls -la /mnt/user/appdata/<container_name>
# Fix permissions (Unraid standard: 99:100)
chown -R 99:100 /mnt/user/appdata/<container_name>
# Example: Fix Vaultwarden
chown -R 99:100 /mnt/user/appdata/vaultwarden
Dependency Missing:
# Example: Guacamole needs MariaDB
docker start mariadb
sleep 10 # Wait for database initialization
docker start ApacheGuacamole
# Verify dependency is running
docker ps | grep mariadb
Resource Exhaustion:
# Check cache usage
df -h /mnt/cache
# If cache full (>90%), clean up
docker system prune -a # ⚠️ REMOVES UNUSED IMAGES!
# Or free space manually
# See service-inventory.md for cleanup recommendations
Scenario 4: Network Connectivity Issues
Can't Access from LAN:
# SSH into Unraid (via PiKVM if network down)
ssh root@192.168.68.51
# Check if br0 is up
ip addr show br0
# Should show: 192.168.68.51/22
# Verify IP and routes
ip route | grep default
# Should show: default via 192.168.68.1
# Test router connectivity
ping -c 3 192.168.68.1
# Test internet
ping -c 3 8.8.8.8
# Test DNS (Pi-hole)
nslookup google.com 192.168.68.61
Fix Network Issues:
# Restart networking (from console/PiKVM)
/etc/rc.d/rc.inet1 restart
# If that doesn't work, reboot
reboot
Can't Access Containers:
# Check Docker network
docker network inspect bridge
# Verify container IP
docker inspect <container_name> | grep IPAddress
# Test from Unraid host
curl http://172.17.0.5:8080 # Example: open-webui
# Test port mapping
curl http://192.168.68.51:3000 # Should reach open-webui
DNS Not Resolving:
# Test Pi-hole directly
nslookup google.com 192.168.68.61
# If Pi-hole down, check Pi Zero
ping 192.168.68.61
# SSH to Pi-hole
ssh pi@192.168.68.61
# Check Pi-hole status
pihole status
# Restart if needed
pihole restartdns
Scenario 5: Array Won't Start
Symptoms:
- Unraid GUI accessible but array shows "Stopped"
- Disks show errors or missing
Troubleshooting:
# Check disk health
smartctl -a /dev/sdb # Parity
smartctl -a /dev/sdc # Disk 1
# View disk assignments
cat /boot/config/disk.cfg
# Check for filesystem errors (read-only check)
xfs_repair -n /dev/md1p1
Common Causes:
- Parity sync in progress (wait for completion)
- Disk failed (check SMART, may need replacement)
- Unclean shutdown (filesystem check required)
- Disk assignment changed
Recovery:
-
Start Array in Maintenance Mode
- Click "Start" in Unraid GUI
- Select "Maintenance mode" if prompted
- Run filesystem check if prompted
-
Review Logs
- Settings → System Log
- Look for disk errors
- Check for power events
-
If Disk Failed
- Follow Unraid disk replacement procedure
- Do NOT format or write to disk unnecessarily
- Seek help in Unraid forums if uncertain
🔧 Critical Service Restart Procedures
Restart Core Services (Proper Order)
1. Infrastructure First:
# Start reverse proxy (for routing)
docker start NginxProxyManager
# Wait for it to be ready
sleep 5
docker ps | grep NginxProxyManager
# Start tunnel (for remote access)
docker start Cloudflared
# Verify both running
docker ps | grep -E "NginxProxyManager|Cloudflared"
2. Security Services:
# Password manager (critical!)
docker start vaultwarden
# Wait for healthy status
sleep 10
docker ps | grep vaultwarden
# Should show "(healthy)"
# If not healthy, check logs
docker logs --tail 50 vaultwarden
3. Development Tools:
# Git server
docker start Gitea
# Wait for initialization
sleep 5
# Remote access gateway
docker start ApacheGuacamole
# Note: Needs MariaDB if configured
4. Monitoring (IMPORTANT!):
# Database first
docker start Influxdb
# Wait for DB to initialize
sleep 15
# Then metrics collector
docker start Telegraf
# Finally visualization
docker start Grafana
# Verify all running
docker ps | grep -E "Influxdb|Telegraf|Grafana"
5. Optional Services:
# LLM backend
docker start ollama
sleep 10
# LLM interface
docker start open-webui
# Wait for healthy
docker ps | grep open-webui
Stop All Services Gracefully
# Stop all running containers
docker stop $(docker ps -q)
# Verify all stopped
docker ps
# Should show empty output
# Wait before stopping array
sleep 5
# Stop array (from GUI)
# Main → Array Operation → Stop
📦 Backup & Restore Procedures
USB Flash Backup (Unraid Configuration)
Create Backup:
- Navigate to: Main → Flash → Flash Backup
- Click "Backup Now"
- Download ZIP file (e.g.,
unraid-flash-backup-20251031.zip) - Store securely OFF-SERVER:
- OneDrive:
/z_Unraid/Backups/ - External drive
- Cloud storage
- OneDrive:
Restore from Backup:
1. Format new USB drive (if needed)
2. Copy backup ZIP to new USB
3. Extract contents to root of USB
- config/ directory
- bzimage, bzroot, etc.
4. Safely eject USB
5. Boot from new USB
6. Configuration restored automatically
Frequency:
- Weekly minimum
- After ANY configuration change
- Before major updates
Container Data Backup
Critical Directories:
Priority 1 (CRITICAL):
/mnt/user/appdata/vaultwarden/ 🚨 Your passwords!
/mnt/user/appdata/gitea/ 🚨 Your code repositories!
Priority 2 (Important):
/mnt/user/appdata/NginxProxyManager/ Proxy configs
/mnt/user/appdata/Grafana/ Dashboards
/mnt/user/appdata/Influxdb/ Metrics history
Priority 3 (Optional):
/mnt/user/appdata/open-webui/ LLM chat history
Quick Backup Script:
#!/bin/bash
# Save as: /mnt/user/scripts/backup-critical.sh
BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
echo "Stopping containers..."
docker stop vaultwarden Gitea NginxProxyManager
echo "Backing up data..."
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager
echo "Restarting containers..."
docker start vaultwarden Gitea NginxProxyManager
echo "✅ Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"
Make Executable:
chmod +x /mnt/user/scripts/backup-critical.sh
Run Manually:
/mnt/user/scripts/backup-critical.sh
Schedule (User Scripts Plugin):
- Frequency: Daily at 2 AM
- Retention: Keep last 30 days
Restore from Backup:
# Example: Restore Vaultwarden
docker stop vaultwarden
# Backup current (corrupted) data
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old
# Extract backup
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /
# Restart container
docker start vaultwarden
# Verify working
curl http://192.168.68.51:4743
⚡ Quick Commands Reference
System Status
# System uptime and load
uptime
# Resource usage
free -h
df -h
# Array status
cat /proc/mdcmd
# Docker container summary
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Temperature (if sensors installed)
sensors
# Disk health quick check
smartctl -H /dev/sdb # Parity
smartctl -H /dev/sdc # Disk 1
Docker Quick Commands
# Start all stopped containers
docker start $(docker ps -aq)
# Stop all running containers
docker stop $(docker ps -q)
# View logs (last 50 lines)
docker logs --tail 50 <container_name>
# Follow logs in real-time
docker logs -f <container_name>
# Restart container
docker restart <container_name>
# Remove container (⚠️ will lose non-volume data!)
docker rm <container_name>
# Clean up unused resources
docker system prune # Safe cleanup
docker system prune -a # ⚠️ Removes unused images too!
docker system prune --volumes # ⚠️ Removes unused volumes!
Network Diagnostics
# Check all interfaces
ip addr show
# Test key infrastructure
ping -c 3 192.168.68.1 # Router
ping -c 3 192.168.68.51 # Unraid
ping -c 3 192.168.68.61 # Pi-hole
ping -c 3 8.8.8.8 # Internet
# DNS resolution test
nslookup google.com
nslookup google.com 192.168.68.61 # Test Pi-hole specifically
# Check listening ports
netstat -tulpn | grep LISTEN
# Test specific port
nc -zv 192.168.68.51 3002 # Example: Gitea
curl -I http://192.168.68.51:3002 # HTTP test
Quick Health Check Script
#!/bin/bash
# Save as: /mnt/user/scripts/health-check.sh
echo "=== Unraid Health Check ==="
echo ""
echo "1. Array Status:"
cat /proc/mdcmd | grep mdState
echo ""
echo "2. Running Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"
echo ""
echo "3. Disk Usage:"
df -h | grep -E "cache|disk1|Filesystem"
echo ""
echo "4. Network Connectivity:"
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL"
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL"
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL"
echo ""
echo "5. Critical Services:"
curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN"
curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN"
curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN"
echo ""
echo "=== Health Check Complete ==="
Run: bash /mnt/user/scripts/health-check.sh
📞 Getting Help
Pre-flight Checks
Before asking for help, gather this information:
-
System Diagnostics
- Unraid WebGUI: Tools → Diagnostics → Download
- Creates ZIP with all logs
-
Container Logs
docker logs <container_name> > container-logs.txt -
Network Configuration
ip addr show > network-config.txt ip route show >> network-config.txt -
Disk Status
smartctl -a /dev/sdb > disk-smart.txt smartctl -a /dev/sdc >> disk-smart.txt
Community Resources
-
Unraid Forums: https://forums.unraid.net/
- Post diagnostics ZIP
- Be specific about symptoms
- Include what you've tried
-
r/unraid: https://reddit.com/r/unraid
- Quick questions
- Share diagnostics in pastebin
-
Discord: Unraid Official Discord
- Real-time help
- Active community
Emergency Contacts
ISP Support: [Your ISP Phone Number]
Unraid License: [Store in secure location]
USB Backup Location: [Document where stored]
Off-site Backup: [If applicable]
🎓 Post-Recovery Checklist
After restoring from disaster:
[ ] Unraid array started successfully
[ ] All critical services running
[ ] NginxProxyManager
[ ] Cloudflared
[ ] Vaultwarden
[ ] Gitea
[ ] Network connectivity verified
[ ] Can access Unraid WebUI
[ ] Can ping router (192.168.68.1)
[ ] Internet working
[ ] DNS resolving (Pi-hole)
[ ] Vaultwarden accessible (test password retrieval)
[ ] Gitea accessible (verify repositories intact)
[ ] NPM routing working (test reverse proxy)
[ ] Monitoring stack restarted
[ ] Grafana
[ ] InfluxDB
[ ] Telegraf
[ ] External access working
[ ] Tailscale connected
[ ] Cloudflare tunnel active
[ ] Backups verified and up-to-date
[ ] Documentation updated with lessons learned
[ ] Incident documented in change log (Gitea)
🔒 Security After Recovery
Immediately After Disaster Recovery:
-
Change Passwords (if compromise suspected)
[ ] Unraid root password [ ] Vaultwarden master password [ ] Container admin passwords [ ] Pi-hole admin password [ ] PiKVM password -
Review Access Logs
# Check SSH attempts grep "Failed password" /var/log/auth.log | tail -50 # Check NPM access docker logs NginxProxyManager | grep -i error # Check Gitea access docker logs Gitea | grep -i login -
Verify Firewall Rules
iptables -L -n -v -
Check for Unauthorized Changes
# Review Docker containers docker ps -a # Check cron jobs crontab -l # Review network interfaces ip addr show
📝 Documentation Updates After Incident
What to Document:
-
What Happened:
- Date/time of incident
- Symptoms observed
- Root cause (if determined)
- Duration of outage
-
What You Did:
- Steps taken to recover
- What worked / didn't work
- Resources used (forums, docs, etc.)
- Time to recovery
-
Lessons Learned:
- What could prevent this in future
- Process improvements needed
- Documentation gaps discovered
- Backup improvements needed
-
Action Items:
- Backups to implement/improve
- Monitoring to add
- Scripts to create
- Hardware to replace/upgrade
Where to Document:
- Create incident report:
docs/incidents/YYYY-MM-DD-incident-name.md - Update this quick-start guide with new procedures
- Add to troubleshooting section if recurring issue
- Commit to Gitea with detailed message
🚀 Normal Startup Sequence
From Cold Boot:
1. Power on server
↓
2. BIOS POST (~30 seconds)
- Hardware check
- Memory test
- Drive detection
↓
3. Unraid boots from USB (~1-2 minutes)
- Linux kernel loads
- Unraid OS starts
↓
4. Network initializes
- br0 interface up
- Gets IP: 192.168.68.51
↓
5. Array auto-starts (if configured)
- Parity disk: sdb
- Data disk: sdc
- Cache: nvme1n1p1
↓
6. Docker service starts
- docker0 bridge created
- Networks initialized
↓
7. Containers auto-start (if enabled)
- Infrastructure services first
- Then application services
↓
8. Services available (~3-5 minutes total)
✅ Ready to use!
Expected Boot Time: 3-5 minutes
If Taking Longer: Check system log for errors
🎯 Quick Health Check Command
Run After Any Restart:
# Quick one-liner health check
docker ps --format "table {{.Names}}\t{{.Status}}" && \
df -h | grep -E "cache|disk1" && \
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
📚 Related Documentation
- Network Issues: See
network-map.md - Service Details: See
service-inventory.md - Container Configs: See
docker-compose/(when created) - Main Overview: See
README.md
🆘 True Emergency - Complete System Down
If everything is down and you need immediate help:
-
Access via PiKVM
- https://192.168.68.53
- Get console access
- View what's happening
-
Check Physical Server
- Power LED on?
- Fans spinning?
- Drives spinning up?
- Network activity lights?
-
Try Safe Mode Boot
- Boot Unraid in Safe Mode (GUI mode)
- Diagnose from console
-
Community Help
- Unraid Discord (fastest response)
- Forums with diagnostics ZIP
- r/unraid for quick questions
-
Document Everything
- Take photos/screenshots via PiKVM
- Note exact error messages
- Record what you tried
- Timeline of events
💡 Pro Tips
-
Test Your Backups
- Restore test annually
- Verify data integrity
- Practice recovery procedures
-
Keep This Guide Accessible
- Save offline copy to phone/laptop
- Print critical sections
- Bookmark in browser
-
Automate Where Possible
- Schedule backup scripts
- Set up monitoring alerts
- Use User Scripts plugin
-
Document As You Go
- Update after fixing issues
- Add new procedures discovered
- Note what worked/didn't work
Last Updated: October 31, 2025
Next Review: Quarterly or after incidents
Maintained By: Weston
Remember: Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!
Keep this guide accessible even when the server is down!
💡 Pro Tip: Save a copy to your phone/laptop/OneDrive!
🚀 You've got this!