# 🚀 Quick Start & Emergency Recovery Guide **Purpose:** Get your homelab back online quickly after disaster **Target Time:** 30-60 minutes to basic functionality **Last Updated:** October 31, 2025 --- ## 🎯 Quick Access Reference ### Essential URLs | Service | URL | Default Credentials | |---------|-----|---------------------| | **Unraid Dashboard** | http://192.168.68.51 | root / (your password) | | **Gitea** | https://gitea.segelschiff.app | Weston / (your password) | | **Vaultwarden** | http://192.168.68.51:4743 | Master password | | **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) | | **Pi-hole** | http://192.168.68.61/admin | (your password) | | **PiKVM** | https://192.168.68.53 | admin / admin (default) | ### SSH Access ```bash # Local network ssh root@192.168.68.51 # Via Tailscale (from anywhere) ssh root@100.122.220.126 # Emergency: Use PiKVM for console access # https://192.168.68.53 ``` --- ## 🆘 Emergency Recovery Scenarios ### Scenario 1: Server Won't Boot 🚨 **Symptoms:** - No network connectivity to 192.168.68.51 - Unraid WebUI unreachable - No response to ping **Recovery Steps:** 1. **Physical Check** (via PiKVM or in person) ``` [ ] Server has power (check LED) [ ] Network cable connected to eth0 [ ] Monitor shows output (via PiKVM) [ ] USB boot drive is present and detected ``` 2. **Use PiKVM for Remote Console** - Access: https://192.168.68.53 - Login: admin / admin - View boot process - Check BIOS/boot messages 3. **Common Boot Issues** **USB Boot Drive Failure** (Most common!) ``` Symptoms: "Boot device not found" or similar Fix: 1. Have backup USB ready 2. Shut down server (via PiKVM power control) 3. Replace USB boot drive 4. Power on 5. Restore configuration from backup ``` **BIOS Settings Changed** ``` Fix: 1. Enter BIOS (DEL/F2 during boot) 2. Load defaults 3. Verify boot order (USB first) 4. Save and exit ``` **Hardware Failure** ``` Check: 1. RAM seated properly 2. All drives detected in BIOS 3. CPU fan spinning 4. No error beeps ``` 4. **Boot from Backup USB** ``` Steps: 1. Power off server 2. Insert backup USB boot drive 3. Power on 4. Verify boot successful 5. Restore configuration: - Tools → Flash Backup → Browse → Select backup ZIP - Reboot ``` **Prevention:** - ✅ Keep USB flash backup updated (weekly) - ✅ Store backup USB in safe location - ✅ Document BIOS settings (screenshots via PiKVM) --- ### Scenario 2: Lost Admin Password **Unraid Root Password Reset:** 1. **Via PiKVM Console** ``` 1. Access PiKVM: https://192.168.68.53 2. View console in browser 3. Wait for login prompt 4. Press Ctrl+Alt+F2 (via PiKVM keyboard) 5. At terminal: passwd root 6. Enter new password twice 7. Press Ctrl+Alt+F1 to return to GUI 8. Update documentation ``` 2. **Via Physical Access** ``` 1. Connect monitor and keyboard to server 2. Press Ctrl+Alt+F2 3. Run: passwd root 4. Set new password 5. Press Ctrl+Alt+F1 ``` **Container Passwords:** - Check `/mnt/user/appdata//config` - Review environment variables in Docker templates - Use Vaultwarden if accessible - Check this documentation repo in Gitea --- ### Scenario 3: Container Won't Start **Quick Diagnosis:** ```bash # Check container status docker ps -a | grep # View recent logs docker logs --tail 100 # Look for errors docker inspect | grep -i error ``` **Common Fixes:** **Port Conflict:** ```bash # Find what's using the port netstat -tulpn | grep # Example: Port 3000 already in use netstat -tulpn | grep 3000 # Stop conflicting service docker stop ``` **Volume Permission Issues:** ```bash # Check ownership ls -la /mnt/user/appdata/ # Fix permissions (Unraid standard: 99:100) chown -R 99:100 /mnt/user/appdata/ # Example: Fix Vaultwarden chown -R 99:100 /mnt/user/appdata/vaultwarden ``` **Dependency Missing:** ```bash # Example: Guacamole needs MariaDB docker start mariadb sleep 10 # Wait for database initialization docker start ApacheGuacamole # Verify dependency is running docker ps | grep mariadb ``` **Resource Exhaustion:** ```bash # Check cache usage df -h /mnt/cache # If cache full (>90%), clean up docker system prune -a # ⚠️ REMOVES UNUSED IMAGES! # Or free space manually # See service-inventory.md for cleanup recommendations ``` --- ### Scenario 4: Network Connectivity Issues **Can't Access from LAN:** ```bash # SSH into Unraid (via PiKVM if network down) ssh root@192.168.68.51 # Check if br0 is up ip addr show br0 # Should show: 192.168.68.51/22 # Verify IP and routes ip route | grep default # Should show: default via 192.168.68.1 # Test router connectivity ping -c 3 192.168.68.1 # Test internet ping -c 3 8.8.8.8 # Test DNS (Pi-hole) nslookup google.com 192.168.68.61 ``` **Fix Network Issues:** ```bash # Restart networking (from console/PiKVM) /etc/rc.d/rc.inet1 restart # If that doesn't work, reboot reboot ``` **Can't Access Containers:** ```bash # Check Docker network docker network inspect bridge # Verify container IP docker inspect | grep IPAddress # Test from Unraid host curl http://172.17.0.5:8080 # Example: open-webui # Test port mapping curl http://192.168.68.51:3000 # Should reach open-webui ``` **DNS Not Resolving:** ```bash # Test Pi-hole directly nslookup google.com 192.168.68.61 # If Pi-hole down, check Pi Zero ping 192.168.68.61 # SSH to Pi-hole ssh pi@192.168.68.61 # Check Pi-hole status pihole status # Restart if needed pihole restartdns ``` --- ### Scenario 5: Array Won't Start **Symptoms:** - Unraid GUI accessible but array shows "Stopped" - Disks show errors or missing **Troubleshooting:** ```bash # Check disk health smartctl -a /dev/sdb # Parity smartctl -a /dev/sdc # Disk 1 # View disk assignments cat /boot/config/disk.cfg # Check for filesystem errors (read-only check) xfs_repair -n /dev/md1p1 ``` **Common Causes:** - Parity sync in progress (wait for completion) - Disk failed (check SMART, may need replacement) - Unclean shutdown (filesystem check required) - Disk assignment changed **Recovery:** 1. **Start Array in Maintenance Mode** - Click "Start" in Unraid GUI - Select "Maintenance mode" if prompted - Run filesystem check if prompted 2. **Review Logs** - Settings → System Log - Look for disk errors - Check for power events 3. **If Disk Failed** - Follow Unraid disk replacement procedure - Do NOT format or write to disk unnecessarily - Seek help in Unraid forums if uncertain --- ## 🔧 Critical Service Restart Procedures ### Restart Core Services (Proper Order) **1. Infrastructure First:** ```bash # Start reverse proxy (for routing) docker start NginxProxyManager # Wait for it to be ready sleep 5 docker ps | grep NginxProxyManager # Start tunnel (for remote access) docker start Cloudflared # Verify both running docker ps | grep -E "NginxProxyManager|Cloudflared" ``` **2. Security Services:** ```bash # Password manager (critical!) docker start vaultwarden # Wait for healthy status sleep 10 docker ps | grep vaultwarden # Should show "(healthy)" # If not healthy, check logs docker logs --tail 50 vaultwarden ``` **3. Development Tools:** ```bash # Git server docker start Gitea # Wait for initialization sleep 5 # Remote access gateway docker start ApacheGuacamole # Note: Needs MariaDB if configured ``` **4. Monitoring (IMPORTANT!):** ```bash # Database first docker start Influxdb # Wait for DB to initialize sleep 15 # Then metrics collector docker start Telegraf # Finally visualization docker start Grafana # Verify all running docker ps | grep -E "Influxdb|Telegraf|Grafana" ``` **5. Optional Services:** ```bash # LLM backend docker start ollama sleep 10 # LLM interface docker start open-webui # Wait for healthy docker ps | grep open-webui ``` --- ### Stop All Services Gracefully ```bash # Stop all running containers docker stop $(docker ps -q) # Verify all stopped docker ps # Should show empty output # Wait before stopping array sleep 5 # Stop array (from GUI) # Main → Array Operation → Stop ``` --- ## 📦 Backup & Restore Procedures ### USB Flash Backup (Unraid Configuration) **Create Backup:** 1. Navigate to: **Main → Flash → Flash Backup** 2. Click "Backup Now" 3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`) 4. Store securely OFF-SERVER: - OneDrive: `/z_Unraid/Backups/` - External drive - Cloud storage **Restore from Backup:** ``` 1. Format new USB drive (if needed) 2. Copy backup ZIP to new USB 3. Extract contents to root of USB - config/ directory - bzimage, bzroot, etc. 4. Safely eject USB 5. Boot from new USB 6. Configuration restored automatically ``` **Frequency:** - Weekly minimum - After ANY configuration change - Before major updates --- ### Container Data Backup **Critical Directories:** ``` Priority 1 (CRITICAL): /mnt/user/appdata/vaultwarden/ 🚨 Your passwords! /mnt/user/appdata/gitea/ 🚨 Your code repositories! Priority 2 (Important): /mnt/user/appdata/NginxProxyManager/ Proxy configs /mnt/user/appdata/Grafana/ Dashboards /mnt/user/appdata/Influxdb/ Metrics history Priority 3 (Optional): /mnt/user/appdata/open-webui/ LLM chat history ``` **Quick Backup Script:** ```bash #!/bin/bash # Save as: /mnt/user/scripts/backup-critical.sh BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)" mkdir -p "$BACKUP_DIR" echo "Stopping containers..." docker stop vaultwarden Gitea NginxProxyManager echo "Backing up data..." tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager echo "Restarting containers..." docker start vaultwarden Gitea NginxProxyManager echo "✅ Backup complete: $BACKUP_DIR" ls -lh "$BACKUP_DIR" ``` **Make Executable:** ```bash chmod +x /mnt/user/scripts/backup-critical.sh ``` **Run Manually:** ```bash /mnt/user/scripts/backup-critical.sh ``` **Schedule (User Scripts Plugin):** - Frequency: Daily at 2 AM - Retention: Keep last 30 days --- **Restore from Backup:** ```bash # Example: Restore Vaultwarden docker stop vaultwarden # Backup current (corrupted) data mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old # Extract backup tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C / # Restart container docker start vaultwarden # Verify working curl http://192.168.68.51:4743 ``` --- ## ⚡ Quick Commands Reference ### System Status ```bash # System uptime and load uptime # Resource usage free -h df -h # Array status cat /proc/mdcmd # Docker container summary docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" # Temperature (if sensors installed) sensors # Disk health quick check smartctl -H /dev/sdb # Parity smartctl -H /dev/sdc # Disk 1 ``` ### Docker Quick Commands ```bash # Start all stopped containers docker start $(docker ps -aq) # Stop all running containers docker stop $(docker ps -q) # View logs (last 50 lines) docker logs --tail 50 # Follow logs in real-time docker logs -f # Restart container docker restart # Remove container (⚠️ will lose non-volume data!) docker rm # Clean up unused resources docker system prune # Safe cleanup docker system prune -a # ⚠️ Removes unused images too! docker system prune --volumes # ⚠️ Removes unused volumes! ``` ### Network Diagnostics ```bash # Check all interfaces ip addr show # Test key infrastructure ping -c 3 192.168.68.1 # Router ping -c 3 192.168.68.51 # Unraid ping -c 3 192.168.68.61 # Pi-hole ping -c 3 8.8.8.8 # Internet # DNS resolution test nslookup google.com nslookup google.com 192.168.68.61 # Test Pi-hole specifically # Check listening ports netstat -tulpn | grep LISTEN # Test specific port nc -zv 192.168.68.51 3002 # Example: Gitea curl -I http://192.168.68.51:3002 # HTTP test ``` ### Quick Health Check Script ```bash #!/bin/bash # Save as: /mnt/user/scripts/health-check.sh echo "=== Unraid Health Check ===" echo "" echo "1. Array Status:" cat /proc/mdcmd | grep mdState echo "" echo "2. Running Containers:" docker ps --format "table {{.Names}}\t{{.Status}}" echo "" echo "3. Disk Usage:" df -h | grep -E "cache|disk1|Filesystem" echo "" echo "4. Network Connectivity:" ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: ✅ OK" || echo " Router: ❌ FAIL" ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: ✅ OK" || echo " Internet: ❌ FAIL" ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: ✅ OK" || echo " Pi-hole: ❌ FAIL" echo "" echo "5. Critical Services:" curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: ✅ OK" || echo " Vaultwarden: ❌ DOWN" curl -s http://localhost:3002 >/dev/null && echo " Gitea: ✅ OK" || echo " Gitea: ❌ DOWN" curl -s http://localhost:7818 >/dev/null && echo " NPM: ✅ OK" || echo " NPM: ❌ DOWN" echo "" echo "=== Health Check Complete ===" ``` **Run:** `bash /mnt/user/scripts/health-check.sh` --- ## 📞 Getting Help ### Pre-flight Checks Before asking for help, gather this information: 1. **System Diagnostics** - Unraid WebGUI: Tools → Diagnostics → Download - Creates ZIP with all logs 2. **Container Logs** ```bash docker logs > container-logs.txt ``` 3. **Network Configuration** ```bash ip addr show > network-config.txt ip route show >> network-config.txt ``` 4. **Disk Status** ```bash smartctl -a /dev/sdb > disk-smart.txt smartctl -a /dev/sdc >> disk-smart.txt ``` ### Community Resources - **Unraid Forums:** https://forums.unraid.net/ - Post diagnostics ZIP - Be specific about symptoms - Include what you've tried - **r/unraid:** https://reddit.com/r/unraid - Quick questions - Share diagnostics in pastebin - **Discord:** Unraid Official Discord - Real-time help - Active community ### Emergency Contacts ``` ISP Support: [Your ISP Phone Number] Unraid License: [Store in secure location] USB Backup Location: [Document where stored] Off-site Backup: [If applicable] ``` --- ## 🎓 Post-Recovery Checklist After restoring from disaster: ``` [ ] Unraid array started successfully [ ] All critical services running [ ] NginxProxyManager [ ] Cloudflared [ ] Vaultwarden [ ] Gitea [ ] Network connectivity verified [ ] Can access Unraid WebUI [ ] Can ping router (192.168.68.1) [ ] Internet working [ ] DNS resolving (Pi-hole) [ ] Vaultwarden accessible (test password retrieval) [ ] Gitea accessible (verify repositories intact) [ ] NPM routing working (test reverse proxy) [ ] Monitoring stack restarted [ ] Grafana [ ] InfluxDB [ ] Telegraf [ ] External access working [ ] Tailscale connected [ ] Cloudflare tunnel active [ ] Backups verified and up-to-date [ ] Documentation updated with lessons learned [ ] Incident documented in change log (Gitea) ``` --- ## 🔒 Security After Recovery **Immediately After Disaster Recovery:** 1. **Change Passwords** (if compromise suspected) ``` [ ] Unraid root password [ ] Vaultwarden master password [ ] Container admin passwords [ ] Pi-hole admin password [ ] PiKVM password ``` 2. **Review Access Logs** ```bash # Check SSH attempts grep "Failed password" /var/log/auth.log | tail -50 # Check NPM access docker logs NginxProxyManager | grep -i error # Check Gitea access docker logs Gitea | grep -i login ``` 3. **Verify Firewall Rules** ```bash iptables -L -n -v ``` 4. **Check for Unauthorized Changes** ```bash # Review Docker containers docker ps -a # Check cron jobs crontab -l # Review network interfaces ip addr show ``` --- ## 📝 Documentation Updates After Incident **What to Document:** 1. **What Happened:** - Date/time of incident - Symptoms observed - Root cause (if determined) - Duration of outage 2. **What You Did:** - Steps taken to recover - What worked / didn't work - Resources used (forums, docs, etc.) - Time to recovery 3. **Lessons Learned:** - What could prevent this in future - Process improvements needed - Documentation gaps discovered - Backup improvements needed 4. **Action Items:** - Backups to implement/improve - Monitoring to add - Scripts to create - Hardware to replace/upgrade **Where to Document:** - Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md` - Update this quick-start guide with new procedures - Add to troubleshooting section if recurring issue - Commit to Gitea with detailed message --- ## 🚀 Normal Startup Sequence **From Cold Boot:** ``` 1. Power on server ↓ 2. BIOS POST (~30 seconds) - Hardware check - Memory test - Drive detection ↓ 3. Unraid boots from USB (~1-2 minutes) - Linux kernel loads - Unraid OS starts ↓ 4. Network initializes - br0 interface up - Gets IP: 192.168.68.51 ↓ 5. Array auto-starts (if configured) - Parity disk: sdb - Data disk: sdc - Cache: nvme1n1p1 ↓ 6. Docker service starts - docker0 bridge created - Networks initialized ↓ 7. Containers auto-start (if enabled) - Infrastructure services first - Then application services ↓ 8. Services available (~3-5 minutes total) ✅ Ready to use! ``` **Expected Boot Time:** 3-5 minutes **If Taking Longer:** Check system log for errors --- ## 🎯 Quick Health Check Command **Run After Any Restart:** ```bash # Quick one-liner health check docker ps --format "table {{.Names}}\t{{.Status}}" && \ df -h | grep -E "cache|disk1" && \ ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL" ``` --- ## 📚 Related Documentation - **Network Issues:** See `network-map.md` - **Service Details:** See `service-inventory.md` - **Container Configs:** See `docker-compose/` (when created) - **Main Overview:** See `README.md` --- ## 🆘 True Emergency - Complete System Down **If everything is down and you need immediate help:** 1. **Access via PiKVM** - https://192.168.68.53 - Get console access - View what's happening 2. **Check Physical Server** - Power LED on? - Fans spinning? - Drives spinning up? - Network activity lights? 3. **Try Safe Mode Boot** - Boot Unraid in Safe Mode (GUI mode) - Diagnose from console 4. **Community Help** - Unraid Discord (fastest response) - Forums with diagnostics ZIP - r/unraid for quick questions 5. **Document Everything** - Take photos/screenshots via PiKVM - Note exact error messages - Record what you tried - Timeline of events --- ## 💡 Pro Tips 1. **Test Your Backups** - Restore test annually - Verify data integrity - Practice recovery procedures 2. **Keep This Guide Accessible** - Save offline copy to phone/laptop - Print critical sections - Bookmark in browser 3. **Automate Where Possible** - Schedule backup scripts - Set up monitoring alerts - Use User Scripts plugin 4. **Document As You Go** - Update after fixing issues - Add new procedures discovered - Note what worked/didn't work --- **Last Updated:** October 31, 2025 **Next Review:** Quarterly or after incidents **Maintained By:** Weston --- **Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help! **Keep this guide accessible even when the server is down!** 💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive! 🚀 **You've got this!**