Files
homelab/quick-start.md
weston 6cbee11482 Phase 1 Complete: Foundation documentation
Added comprehensive homelab documentation:

README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap

docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands

docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan

docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions

This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
2025-11-01 00:42:34 +01:00

20 KiB

🚀 Quick Start & Emergency Recovery Guide

Purpose: Get your homelab back online quickly after disaster
Target Time: 30-60 minutes to basic functionality
Last Updated: October 31, 2025


🎯 Quick Access Reference

Essential URLs

Service URL Default Credentials
Unraid Dashboard http://192.168.68.51 root / (your password)
Gitea https://gitea.segelschiff.app Weston / (your password)
Vaultwarden http://192.168.68.51:4743 Master password
NPM Admin http://192.168.68.51:7818 admin@example.com / changeme (first login)
Pi-hole http://192.168.68.61/admin (your password)
PiKVM https://192.168.68.53 admin / admin (default)

SSH Access

# Local network
ssh root@192.168.68.51

# Via Tailscale (from anywhere)
ssh root@100.122.220.126

# Emergency: Use PiKVM for console access
# https://192.168.68.53

🆘 Emergency Recovery Scenarios

Scenario 1: Server Won't Boot 🚨

Symptoms:

  • No network connectivity to 192.168.68.51
  • Unraid WebUI unreachable
  • No response to ping

Recovery Steps:

  1. Physical Check (via PiKVM or in person)

    [ ] Server has power (check LED)
    [ ] Network cable connected to eth0
    [ ] Monitor shows output (via PiKVM)
    [ ] USB boot drive is present and detected
    
  2. Use PiKVM for Remote Console

  3. Common Boot Issues

    USB Boot Drive Failure (Most common!)

    Symptoms: "Boot device not found" or similar
    
    Fix:
    1. Have backup USB ready
    2. Shut down server (via PiKVM power control)
    3. Replace USB boot drive
    4. Power on
    5. Restore configuration from backup
    

    BIOS Settings Changed

    Fix:
    1. Enter BIOS (DEL/F2 during boot)
    2. Load defaults
    3. Verify boot order (USB first)
    4. Save and exit
    

    Hardware Failure

    Check:
    1. RAM seated properly
    2. All drives detected in BIOS
    3. CPU fan spinning
    4. No error beeps
    
  4. Boot from Backup USB

    Steps:
    1. Power off server
    2. Insert backup USB boot drive
    3. Power on
    4. Verify boot successful
    5. Restore configuration:
       - Tools → Flash Backup → Browse → Select backup ZIP
       - Reboot
    

Prevention:

  • Keep USB flash backup updated (weekly)
  • Store backup USB in safe location
  • Document BIOS settings (screenshots via PiKVM)

Scenario 2: Lost Admin Password

Unraid Root Password Reset:

  1. Via PiKVM Console

    1. Access PiKVM: https://192.168.68.53
    2. View console in browser
    3. Wait for login prompt
    4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
    5. At terminal: passwd root
    6. Enter new password twice
    7. Press Ctrl+Alt+F1 to return to GUI
    8. Update documentation
    
  2. Via Physical Access

    1. Connect monitor and keyboard to server
    2. Press Ctrl+Alt+F2
    3. Run: passwd root
    4. Set new password
    5. Press Ctrl+Alt+F1
    

Container Passwords:

  • Check /mnt/user/appdata/<service>/config
  • Review environment variables in Docker templates
  • Use Vaultwarden if accessible
  • Check this documentation repo in Gitea

Scenario 3: Container Won't Start

Quick Diagnosis:

# Check container status
docker ps -a | grep <container_name>

# View recent logs
docker logs --tail 100 <container_name>

# Look for errors
docker inspect <container_name> | grep -i error

Common Fixes:

Port Conflict:

# Find what's using the port
netstat -tulpn | grep <port>

# Example: Port 3000 already in use
netstat -tulpn | grep 3000

# Stop conflicting service
docker stop <conflicting_container>

Volume Permission Issues:

# Check ownership
ls -la /mnt/user/appdata/<container_name>

# Fix permissions (Unraid standard: 99:100)
chown -R 99:100 /mnt/user/appdata/<container_name>

# Example: Fix Vaultwarden
chown -R 99:100 /mnt/user/appdata/vaultwarden

Dependency Missing:

# Example: Guacamole needs MariaDB
docker start mariadb
sleep 10  # Wait for database initialization
docker start ApacheGuacamole

# Verify dependency is running
docker ps | grep mariadb

Resource Exhaustion:

# Check cache usage
df -h /mnt/cache

# If cache full (>90%), clean up
docker system prune -a  # ⚠️ REMOVES UNUSED IMAGES!

# Or free space manually
# See service-inventory.md for cleanup recommendations

Scenario 4: Network Connectivity Issues

Can't Access from LAN:

# SSH into Unraid (via PiKVM if network down)
ssh root@192.168.68.51

# Check if br0 is up
ip addr show br0
# Should show: 192.168.68.51/22

# Verify IP and routes
ip route | grep default
# Should show: default via 192.168.68.1

# Test router connectivity
ping -c 3 192.168.68.1

# Test internet
ping -c 3 8.8.8.8

# Test DNS (Pi-hole)
nslookup google.com 192.168.68.61

Fix Network Issues:

# Restart networking (from console/PiKVM)
/etc/rc.d/rc.inet1 restart

# If that doesn't work, reboot
reboot

Can't Access Containers:

# Check Docker network
docker network inspect bridge

# Verify container IP
docker inspect <container_name> | grep IPAddress

# Test from Unraid host
curl http://172.17.0.5:8080  # Example: open-webui

# Test port mapping
curl http://192.168.68.51:3000  # Should reach open-webui

DNS Not Resolving:

# Test Pi-hole directly
nslookup google.com 192.168.68.61

# If Pi-hole down, check Pi Zero
ping 192.168.68.61

# SSH to Pi-hole
ssh pi@192.168.68.61

# Check Pi-hole status
pihole status

# Restart if needed
pihole restartdns

Scenario 5: Array Won't Start

Symptoms:

  • Unraid GUI accessible but array shows "Stopped"
  • Disks show errors or missing

Troubleshooting:

# Check disk health
smartctl -a /dev/sdb  # Parity
smartctl -a /dev/sdc  # Disk 1

# View disk assignments
cat /boot/config/disk.cfg

# Check for filesystem errors (read-only check)
xfs_repair -n /dev/md1p1

Common Causes:

  • Parity sync in progress (wait for completion)
  • Disk failed (check SMART, may need replacement)
  • Unclean shutdown (filesystem check required)
  • Disk assignment changed

Recovery:

  1. Start Array in Maintenance Mode

    • Click "Start" in Unraid GUI
    • Select "Maintenance mode" if prompted
    • Run filesystem check if prompted
  2. Review Logs

    • Settings → System Log
    • Look for disk errors
    • Check for power events
  3. If Disk Failed

    • Follow Unraid disk replacement procedure
    • Do NOT format or write to disk unnecessarily
    • Seek help in Unraid forums if uncertain

🔧 Critical Service Restart Procedures

Restart Core Services (Proper Order)

1. Infrastructure First:

# Start reverse proxy (for routing)
docker start NginxProxyManager

# Wait for it to be ready
sleep 5
docker ps | grep NginxProxyManager

# Start tunnel (for remote access)
docker start Cloudflared

# Verify both running
docker ps | grep -E "NginxProxyManager|Cloudflared"

2. Security Services:

# Password manager (critical!)
docker start vaultwarden

# Wait for healthy status
sleep 10
docker ps | grep vaultwarden
# Should show "(healthy)"

# If not healthy, check logs
docker logs --tail 50 vaultwarden

3. Development Tools:

# Git server
docker start Gitea

# Wait for initialization
sleep 5

# Remote access gateway
docker start ApacheGuacamole
# Note: Needs MariaDB if configured

4. Monitoring (IMPORTANT!):

# Database first
docker start Influxdb

# Wait for DB to initialize
sleep 15

# Then metrics collector
docker start Telegraf

# Finally visualization
docker start Grafana

# Verify all running
docker ps | grep -E "Influxdb|Telegraf|Grafana"

5. Optional Services:

# LLM backend
docker start ollama
sleep 10

# LLM interface
docker start open-webui

# Wait for healthy
docker ps | grep open-webui

Stop All Services Gracefully

# Stop all running containers
docker stop $(docker ps -q)

# Verify all stopped
docker ps
# Should show empty output

# Wait before stopping array
sleep 5

# Stop array (from GUI)
# Main → Array Operation → Stop

📦 Backup & Restore Procedures

USB Flash Backup (Unraid Configuration)

Create Backup:

  1. Navigate to: Main → Flash → Flash Backup
  2. Click "Backup Now"
  3. Download ZIP file (e.g., unraid-flash-backup-20251031.zip)
  4. Store securely OFF-SERVER:
    • OneDrive: /z_Unraid/Backups/
    • External drive
    • Cloud storage

Restore from Backup:

1. Format new USB drive (if needed)
2. Copy backup ZIP to new USB
3. Extract contents to root of USB
   - config/ directory
   - bzimage, bzroot, etc.
4. Safely eject USB
5. Boot from new USB
6. Configuration restored automatically

Frequency:

  • Weekly minimum
  • After ANY configuration change
  • Before major updates

Container Data Backup

Critical Directories:

Priority 1 (CRITICAL):
/mnt/user/appdata/vaultwarden/     🚨 Your passwords!
/mnt/user/appdata/gitea/            🚨 Your code repositories!

Priority 2 (Important):
/mnt/user/appdata/NginxProxyManager/  Proxy configs
/mnt/user/appdata/Grafana/            Dashboards
/mnt/user/appdata/Influxdb/           Metrics history

Priority 3 (Optional):
/mnt/user/appdata/open-webui/         LLM chat history

Quick Backup Script:

#!/bin/bash
# Save as: /mnt/user/scripts/backup-critical.sh

BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"

echo "Stopping containers..."
docker stop vaultwarden Gitea NginxProxyManager

echo "Backing up data..."
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager

echo "Restarting containers..."
docker start vaultwarden Gitea NginxProxyManager

echo "✅ Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"

Make Executable:

chmod +x /mnt/user/scripts/backup-critical.sh

Run Manually:

/mnt/user/scripts/backup-critical.sh

Schedule (User Scripts Plugin):

  • Frequency: Daily at 2 AM
  • Retention: Keep last 30 days

Restore from Backup:

# Example: Restore Vaultwarden
docker stop vaultwarden

# Backup current (corrupted) data
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old

# Extract backup
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /

# Restart container
docker start vaultwarden

# Verify working
curl http://192.168.68.51:4743

Quick Commands Reference

System Status

# System uptime and load
uptime

# Resource usage
free -h
df -h

# Array status
cat /proc/mdcmd

# Docker container summary
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Temperature (if sensors installed)
sensors

# Disk health quick check
smartctl -H /dev/sdb  # Parity
smartctl -H /dev/sdc  # Disk 1

Docker Quick Commands

# Start all stopped containers
docker start $(docker ps -aq)

# Stop all running containers
docker stop $(docker ps -q)

# View logs (last 50 lines)
docker logs --tail 50 <container_name>

# Follow logs in real-time
docker logs -f <container_name>

# Restart container
docker restart <container_name>

# Remove container (⚠️ will lose non-volume data!)
docker rm <container_name>

# Clean up unused resources
docker system prune        # Safe cleanup
docker system prune -a     # ⚠️ Removes unused images too!
docker system prune --volumes  # ⚠️ Removes unused volumes!

Network Diagnostics

# Check all interfaces
ip addr show

# Test key infrastructure
ping -c 3 192.168.68.1   # Router
ping -c 3 192.168.68.51  # Unraid
ping -c 3 192.168.68.61  # Pi-hole
ping -c 3 8.8.8.8        # Internet

# DNS resolution test
nslookup google.com
nslookup google.com 192.168.68.61  # Test Pi-hole specifically

# Check listening ports
netstat -tulpn | grep LISTEN

# Test specific port
nc -zv 192.168.68.51 3002  # Example: Gitea
curl -I http://192.168.68.51:3002  # HTTP test

Quick Health Check Script

#!/bin/bash
# Save as: /mnt/user/scripts/health-check.sh

echo "=== Unraid Health Check ==="
echo ""

echo "1. Array Status:"
cat /proc/mdcmd | grep mdState

echo ""
echo "2. Running Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"

echo ""
echo "3. Disk Usage:"
df -h | grep -E "cache|disk1|Filesystem"

echo ""
echo "4. Network Connectivity:"
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo "  Router: ✅ OK" || echo "  Router: ❌ FAIL"
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo "  Internet: ✅ OK" || echo "  Internet: ❌ FAIL"
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo "  Pi-hole: ✅ OK" || echo "  Pi-hole: ❌ FAIL"

echo ""
echo "5. Critical Services:"
curl -s http://localhost:4743 >/dev/null && echo "  Vaultwarden: ✅ OK" || echo "  Vaultwarden: ❌ DOWN"
curl -s http://localhost:3002 >/dev/null && echo "  Gitea: ✅ OK" || echo "  Gitea: ❌ DOWN"
curl -s http://localhost:7818 >/dev/null && echo "  NPM: ✅ OK" || echo "  NPM: ❌ DOWN"

echo ""
echo "=== Health Check Complete ==="

Run: bash /mnt/user/scripts/health-check.sh


📞 Getting Help

Pre-flight Checks

Before asking for help, gather this information:

  1. System Diagnostics

    • Unraid WebGUI: Tools → Diagnostics → Download
    • Creates ZIP with all logs
  2. Container Logs

    docker logs <container_name> > container-logs.txt
    
  3. Network Configuration

    ip addr show > network-config.txt
    ip route show >> network-config.txt
    
  4. Disk Status

    smartctl -a /dev/sdb > disk-smart.txt
    smartctl -a /dev/sdc >> disk-smart.txt
    

Community Resources

  • Unraid Forums: https://forums.unraid.net/

    • Post diagnostics ZIP
    • Be specific about symptoms
    • Include what you've tried
  • r/unraid: https://reddit.com/r/unraid

    • Quick questions
    • Share diagnostics in pastebin
  • Discord: Unraid Official Discord

    • Real-time help
    • Active community

Emergency Contacts

ISP Support: [Your ISP Phone Number]
Unraid License: [Store in secure location]
USB Backup Location: [Document where stored]
Off-site Backup: [If applicable]

🎓 Post-Recovery Checklist

After restoring from disaster:

[ ] Unraid array started successfully
[ ] All critical services running
    [ ] NginxProxyManager
    [ ] Cloudflared  
    [ ] Vaultwarden
    [ ] Gitea
[ ] Network connectivity verified
    [ ] Can access Unraid WebUI
    [ ] Can ping router (192.168.68.1)
    [ ] Internet working
    [ ] DNS resolving (Pi-hole)
[ ] Vaultwarden accessible (test password retrieval)
[ ] Gitea accessible (verify repositories intact)
[ ] NPM routing working (test reverse proxy)
[ ] Monitoring stack restarted
    [ ] Grafana
    [ ] InfluxDB
    [ ] Telegraf
[ ] External access working
    [ ] Tailscale connected
    [ ] Cloudflare tunnel active
[ ] Backups verified and up-to-date
[ ] Documentation updated with lessons learned
[ ] Incident documented in change log (Gitea)

🔒 Security After Recovery

Immediately After Disaster Recovery:

  1. Change Passwords (if compromise suspected)

    [ ] Unraid root password
    [ ] Vaultwarden master password
    [ ] Container admin passwords
    [ ] Pi-hole admin password
    [ ] PiKVM password
    
  2. Review Access Logs

    # Check SSH attempts
    grep "Failed password" /var/log/auth.log | tail -50
    
    # Check NPM access
    docker logs NginxProxyManager | grep -i error
    
    # Check Gitea access
    docker logs Gitea | grep -i login
    
  3. Verify Firewall Rules

    iptables -L -n -v
    
  4. Check for Unauthorized Changes

    # Review Docker containers
    docker ps -a
    
    # Check cron jobs
    crontab -l
    
    # Review network interfaces
    ip addr show
    

📝 Documentation Updates After Incident

What to Document:

  1. What Happened:

    • Date/time of incident
    • Symptoms observed
    • Root cause (if determined)
    • Duration of outage
  2. What You Did:

    • Steps taken to recover
    • What worked / didn't work
    • Resources used (forums, docs, etc.)
    • Time to recovery
  3. Lessons Learned:

    • What could prevent this in future
    • Process improvements needed
    • Documentation gaps discovered
    • Backup improvements needed
  4. Action Items:

    • Backups to implement/improve
    • Monitoring to add
    • Scripts to create
    • Hardware to replace/upgrade

Where to Document:

  • Create incident report: docs/incidents/YYYY-MM-DD-incident-name.md
  • Update this quick-start guide with new procedures
  • Add to troubleshooting section if recurring issue
  • Commit to Gitea with detailed message

🚀 Normal Startup Sequence

From Cold Boot:

1. Power on server
   ↓
2. BIOS POST (~30 seconds)
   - Hardware check
   - Memory test
   - Drive detection
   ↓
3. Unraid boots from USB (~1-2 minutes)
   - Linux kernel loads
   - Unraid OS starts
   ↓
4. Network initializes
   - br0 interface up
   - Gets IP: 192.168.68.51
   ↓
5. Array auto-starts (if configured)
   - Parity disk: sdb
   - Data disk: sdc
   - Cache: nvme1n1p1
   ↓
6. Docker service starts
   - docker0 bridge created
   - Networks initialized
   ↓
7. Containers auto-start (if enabled)
   - Infrastructure services first
   - Then application services
   ↓
8. Services available (~3-5 minutes total)
   ✅ Ready to use!

Expected Boot Time: 3-5 minutes
If Taking Longer: Check system log for errors


🎯 Quick Health Check Command

Run After Any Restart:

# Quick one-liner health check
docker ps --format "table {{.Names}}\t{{.Status}}" && \
df -h | grep -E "cache|disk1" && \
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"

  • Network Issues: See network-map.md
  • Service Details: See service-inventory.md
  • Container Configs: See docker-compose/ (when created)
  • Main Overview: See README.md

🆘 True Emergency - Complete System Down

If everything is down and you need immediate help:

  1. Access via PiKVM

  2. Check Physical Server

    • Power LED on?
    • Fans spinning?
    • Drives spinning up?
    • Network activity lights?
  3. Try Safe Mode Boot

    • Boot Unraid in Safe Mode (GUI mode)
    • Diagnose from console
  4. Community Help

    • Unraid Discord (fastest response)
    • Forums with diagnostics ZIP
    • r/unraid for quick questions
  5. Document Everything

    • Take photos/screenshots via PiKVM
    • Note exact error messages
    • Record what you tried
    • Timeline of events

💡 Pro Tips

  1. Test Your Backups

    • Restore test annually
    • Verify data integrity
    • Practice recovery procedures
  2. Keep This Guide Accessible

    • Save offline copy to phone/laptop
    • Print critical sections
    • Bookmark in browser
  3. Automate Where Possible

    • Schedule backup scripts
    • Set up monitoring alerts
    • Use User Scripts plugin
  4. Document As You Go

    • Update after fixing issues
    • Add new procedures discovered
    • Note what worked/didn't work

Last Updated: October 31, 2025
Next Review: Quarterly or after incidents
Maintained By: Weston


Remember: Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!

Keep this guide accessible even when the server is down!
💡 Pro Tip: Save a copy to your phone/laptop/OneDrive!

🚀 You've got this!