homelab/docs/quick-start.md

# 🚀 Quick Start & Emergency Recovery Guide

**Purpose:** Get your homelab back online quickly after disaster
**Target Time:** 30-60 minutes to basic functionality
**Last Updated:** October 31, 2025

---

## 🎯 Quick Access Reference

### Essential URLs

| Service | URL | Default Credentials |
|---------|-----|---------------------|
| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) |
| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) |
| **Vaultwarden** | http://192.168.68.51:4743 | Master password |
| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) |
| **Pi-hole** | http://192.168.68.61/admin | (your password) |
| **PiKVM** | https://192.168.68.53 | admin / admin (default) |

### SSH Access

```bash
# Local network
ssh root@192.168.68.51

# Via Tailscale (from anywhere)
ssh root@100.122.220.126

# Emergency: Use PiKVM for console access
# https://192.168.68.53
```

---

## 🆘 Emergency Recovery Scenarios

### Scenario 1: Server Won't Boot 🚨

**Symptoms:**
- No network connectivity to 192.168.68.51
- Unraid WebUI unreachable
- No response to ping

**Recovery Steps:**

1. **Physical Check** (via PiKVM or in person)
   ```
   [ ] Server has power (check LED)
   [ ] Network cable connected to eth0
   [ ] Monitor shows output (via PiKVM)
   [ ] USB boot drive is present and detected
   ```

2. **Use PiKVM for Remote Console**
   - Access: https://192.168.68.53
   - Login: admin / admin
   - View boot process
   - Check BIOS/boot messages

3. **Common Boot Issues**

   **USB Boot Drive Failure** (Most common!)
   ```
   Symptoms: "Boot device not found" or similar

   Fix:
   1. Have backup USB ready
   2. Shut down server (via PiKVM power control)
   3. Replace USB boot drive
   4. Power on
   5. Restore configuration from backup
   ```

   **BIOS Settings Changed**
   ```
   Fix:
   1. Enter BIOS (DEL/F2 during boot)
   2. Load defaults
   3. Verify boot order (USB first)
   4. Save and exit
   ```

   **Hardware Failure**
   ```
   Check:
   1. RAM seated properly
   2. All drives detected in BIOS
   3. CPU fan spinning
   4. No error beeps
   ```

4. **Boot from Backup USB**
   ```
   Steps:
   1. Power off server
   2. Insert backup USB boot drive
   3. Power on
   4. Verify boot successful
   5. Restore configuration:
      - Tools → Flash Backup → Browse → Select backup ZIP
      - Reboot
   ```

**Prevention:**
- ✅ Keep USB flash backup updated (weekly)
- ✅ Store backup USB in safe location
- ✅ Document BIOS settings (screenshots via PiKVM)

---

### Scenario 2: Lost Admin Password

**Unraid Root Password Reset:**

1. **Via PiKVM Console**
   ```
   1. Access PiKVM: https://192.168.68.53
   2. View console in browser
   3. Wait for login prompt
   4. Press Ctrl+Alt+F2 (via PiKVM keyboard)
   5. At terminal: passwd root
   6. Enter new password twice
   7. Press Ctrl+Alt+F1 to return to GUI
   8. Update documentation
   ```

2. **Via Physical Access**
   ```
   1. Connect monitor and keyboard to server
   2. Press Ctrl+Alt+F2
   3. Run: passwd root
   4. Set new password
   5. Press Ctrl+Alt+F1
   ```

**Container Passwords:**
- Check `/mnt/user/appdata/<service>/config`
- Review environment variables in Docker templates
- Use Vaultwarden if accessible
- Check this documentation repo in Gitea

---

### Scenario 3: Container Won't Start

**Quick Diagnosis:**

```bash
# Check container status
docker ps -a | grep <container_name>

# View recent logs
docker logs --tail 100 <container_name>

# Look for errors
docker inspect <container_name> | grep -i error
```

**Common Fixes:**

**Port Conflict:**
```bash
# Find what's using the port
netstat -tulpn | grep <port>

# Example: Port 3000 already in use
netstat -tulpn | grep 3000

# Stop conflicting service
docker stop <conflicting_container>
```

**Volume Permission Issues:**
```bash
# Check ownership
ls -la /mnt/user/appdata/<container_name>

# Fix permissions (Unraid standard: 99:100)
chown -R 99:100 /mnt/user/appdata/<container_name>

# Example: Fix Vaultwarden
chown -R 99:100 /mnt/user/appdata/vaultwarden
```

**Dependency Missing:**
```bash
# Example: Guacamole needs MariaDB
docker start mariadb
sleep 10  # Wait for database initialization
docker start ApacheGuacamole

# Verify dependency is running
docker ps | grep mariadb
```

**Resource Exhaustion:**
```bash
# Check cache usage
df -h /mnt/cache

# If cache full (>90%), clean up
docker system prune -a  # ⚠️ REMOVES UNUSED IMAGES!

# Or free space manually
# See service-inventory.md for cleanup recommendations
```

---

### Scenario 4: Network Connectivity Issues

**Can't Access from LAN:**

```bash
# SSH into Unraid (via PiKVM if network down)
ssh root@192.168.68.51

# Check if br0 is up
ip addr show br0
# Should show: 192.168.68.51/22

# Verify IP and routes
ip route | grep default
# Should show: default via 192.168.68.1

# Test router connectivity
ping -c 3 192.168.68.1

# Test internet
ping -c 3 8.8.8.8

# Test DNS (Pi-hole)
nslookup google.com 192.168.68.61
```

**Fix Network Issues:**

```bash
# Restart networking (from console/PiKVM)
/etc/rc.d/rc.inet1 restart

# If that doesn't work, reboot
reboot
```

**Can't Access Containers:**

```bash
# Check Docker network
docker network inspect bridge

# Verify container IP
docker inspect <container_name> | grep IPAddress

# Test from Unraid host
curl http://172.17.0.5:8080  # Example: open-webui

# Test port mapping
curl http://192.168.68.51:3000  # Should reach open-webui
```

**DNS Not Resolving:**

```bash
# Test Pi-hole directly
nslookup google.com 192.168.68.61

# If Pi-hole down, check Pi Zero
ping 192.168.68.61

# SSH to Pi-hole
ssh pi@192.168.68.61

# Check Pi-hole status
pihole status

# Restart if needed
pihole restartdns
```

---

### Scenario 5: Array Won't Start

**Symptoms:**
- Unraid GUI accessible but array shows "Stopped"
- Disks show errors or missing

**Troubleshooting:**

```bash
# Check disk health
smartctl -a /dev/sdb  # Parity
smartctl -a /dev/sdc  # Disk 1

# View disk assignments
cat /boot/config/disk.cfg

# Check for filesystem errors (read-only check)
xfs_repair -n /dev/md1p1
```

**Common Causes:**
- Parity sync in progress (wait for completion)
- Disk failed (check SMART, may need replacement)
- Unclean shutdown (filesystem check required)
- Disk assignment changed

**Recovery:**

1. **Start Array in Maintenance Mode**
   - Click "Start" in Unraid GUI
   - Select "Maintenance mode" if prompted
   - Run filesystem check if prompted

2. **Review Logs**
   - Settings → System Log
   - Look for disk errors
   - Check for power events

3. **If Disk Failed**
   - Follow Unraid disk replacement procedure
   - Do NOT format or write to disk unnecessarily
   - Seek help in Unraid forums if uncertain

---

## 🔧 Critical Service Restart Procedures

### Restart Core Services (Proper Order)

**1. Infrastructure First:**
```bash
# Start reverse proxy (for routing)
docker start NginxProxyManager

# Wait for it to be ready
sleep 5
docker ps | grep NginxProxyManager

# Start tunnel (for remote access)
docker start Cloudflared

# Verify both running
docker ps | grep -E "NginxProxyManager|Cloudflared"
```

**2. Security Services:**
```bash
# Password manager (critical!)
docker start vaultwarden

# Wait for healthy status
sleep 10
docker ps | grep vaultwarden
# Should show "(healthy)"

# If not healthy, check logs
docker logs --tail 50 vaultwarden
```

**3. Development Tools:**
```bash
# Git server
docker start Gitea

# Wait for initialization
sleep 5

# Remote access gateway
docker start ApacheGuacamole
# Note: Needs MariaDB if configured
```

**4. Monitoring (IMPORTANT!):**
```bash
# Database first
docker start Influxdb

# Wait for DB to initialize
sleep 15

# Then metrics collector
docker start Telegraf

# Finally visualization
docker start Grafana

# Verify all running
docker ps | grep -E "Influxdb|Telegraf|Grafana"
```

**5. Optional Services:**
```bash
# LLM backend
docker start ollama
sleep 10

# LLM interface
docker start open-webui

# Wait for healthy
docker ps | grep open-webui
```

---

### Stop All Services Gracefully

```bash
# Stop all running containers
docker stop $(docker ps -q)

# Verify all stopped
docker ps
# Should show empty output

# Wait before stopping array
sleep 5

# Stop array (from GUI)
# Main → Array Operation → Stop
```

---

## 📦 Backup & Restore Procedures

### USB Flash Backup (Unraid Configuration)

**Create Backup:**
1. Navigate to: **Main → Flash → Flash Backup**
2. Click "Backup Now"
3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`)
4. Store securely OFF-SERVER:
   - OneDrive: `/z_Unraid/Backups/`
   - External drive
   - Cloud storage

**Restore from Backup:**
```
1. Format new USB drive (if needed)
2. Copy backup ZIP to new USB
3. Extract contents to root of USB
   - config/ directory
   - bzimage, bzroot, etc.
4. Safely eject USB
5. Boot from new USB
6. Configuration restored automatically
```

**Frequency:**
- Weekly minimum
- After ANY configuration change
- Before major updates

---

### Container Data Backup

**Critical Directories:**

```
Priority 1 (CRITICAL):
/mnt/user/appdata/vaultwarden/     🚨 Your passwords!
/mnt/user/appdata/gitea/            🚨 Your code repositories!

Priority 2 (Important):
/mnt/user/appdata/NginxProxyManager/  Proxy configs
/mnt/user/appdata/Grafana/            Dashboards
/mnt/user/appdata/Influxdb/           Metrics history

Priority 3 (Optional):
/mnt/user/appdata/open-webui/         LLM chat history
```

**Quick Backup Script:**

```bash
#!/bin/bash
# Save as: /mnt/user/scripts/backup-critical.sh

BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"

echo "Stopping containers..."
docker stop vaultwarden Gitea NginxProxyManager

echo "Backing up data..."
tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden
tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea
tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager

echo "Restarting containers..."
docker start vaultwarden Gitea NginxProxyManager

echo "✅ Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"
```

**Make Executable:**
```bash
chmod +x /mnt/user/scripts/backup-critical.sh
```

**Run Manually:**
```bash
/mnt/user/scripts/backup-critical.sh
```

**Schedule (User Scripts Plugin):**
- Frequency: Daily at 2 AM
- Retention: Keep last 30 days

---

**Restore from Backup:**

```bash
# Example: Restore Vaultwarden
docker stop vaultwarden

# Backup current (corrupted) data
mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old

# Extract backup
tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C /

# Restart container
docker start vaultwarden

# Verify working
curl http://192.168.68.51:4743
```

---

## ⚡ Quick Commands Reference

### System Status

```bash
# System uptime and load
uptime

# Resource usage
free -h
df -h

# Array status
cat /proc/mdcmd

# Docker container summary
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Temperature (if sensors installed)
sensors

# Disk health quick check
smartctl -H /dev/sdb  # Parity
smartctl -H /dev/sdc  # Disk 1
```

### Docker Quick Commands

```bash
# Start all stopped containers
docker start $(docker ps -aq)

# Stop all running containers
docker stop $(docker ps -q)

# View logs (last 50 lines)
docker logs --tail 50 <container_name>

# Follow logs in real-time
docker logs -f <container_name>

# Restart container
docker restart <container_name>

# Remove container (⚠️ will lose non-volume data!)
docker rm <container_name>

# Clean up unused resources
docker system prune        # Safe cleanup
docker system prune -a     # ⚠️ Removes unused images too!
docker system prune --volumes  # ⚠️ Removes unused volumes!
```

### Network Diagnostics

```bash
# Check all interfaces
ip addr show

# Test key infrastructure
ping -c 3 192.168.68.1   # Router
ping -c 3 192.168.68.51  # Unraid
ping -c 3 192.168.68.61  # Pi-hole
ping -c 3 8.8.8.8        # Internet

# DNS resolution test
nslookup google.com
nslookup google.com 192.168.68.61  # Test Pi-hole specifically

# Check listening ports
netstat -tulpn | grep LISTEN

# Test specific port
nc -zv 192.168.68.51 3002  # Example: Gitea
curl -I http://192.168.68.51:3002  # HTTP test
```

### Quick Health Check Script

```bash
#!/bin/bash
# Save as: /mnt/user/scripts/health-check.sh

echo "=== Unraid Health Check ==="
echo ""

echo "1. Array Status:"
cat /proc/mdcmd | grep mdState

echo ""
echo "2. Running Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"

echo ""
echo "3. Disk Usage:"
df -h | grep -E "cache|disk1|Filesystem"

echo ""
echo "4. Network Connectivity:"
ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo "  Router: ✅ OK" || echo "  Router: ❌ FAIL"
ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo "  Internet: ✅ OK" || echo "  Internet: ❌ FAIL"
ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo "  Pi-hole: ✅ OK" || echo "  Pi-hole: ❌ FAIL"

echo ""
echo "5. Critical Services:"
curl -s http://localhost:4743 >/dev/null && echo "  Vaultwarden: ✅ OK" || echo "  Vaultwarden: ❌ DOWN"
curl -s http://localhost:3002 >/dev/null && echo "  Gitea: ✅ OK" || echo "  Gitea: ❌ DOWN"
curl -s http://localhost:7818 >/dev/null && echo "  NPM: ✅ OK" || echo "  NPM: ❌ DOWN"

echo ""
echo "=== Health Check Complete ==="
```

**Run:** `bash /mnt/user/scripts/health-check.sh`

---

## 📞 Getting Help

### Pre-flight Checks

Before asking for help, gather this information:

1. **System Diagnostics**
   - Unraid WebGUI: Tools → Diagnostics → Download
   - Creates ZIP with all logs

2. **Container Logs**
   ```bash
   docker logs <container_name> > container-logs.txt
   ```

3. **Network Configuration**
   ```bash
   ip addr show > network-config.txt
   ip route show >> network-config.txt
   ```

4. **Disk Status**
   ```bash
   smartctl -a /dev/sdb > disk-smart.txt
   smartctl -a /dev/sdc >> disk-smart.txt
   ```

### Community Resources

- **Unraid Forums:** https://forums.unraid.net/
  - Post diagnostics ZIP
  - Be specific about symptoms
  - Include what you've tried

- **r/unraid:** https://reddit.com/r/unraid
  - Quick questions
  - Share diagnostics in pastebin

- **Discord:** Unraid Official Discord
  - Real-time help
  - Active community

### Emergency Contacts

```
ISP Support: [Your ISP Phone Number]
Unraid License: [Store in secure location]
USB Backup Location: [Document where stored]
Off-site Backup: [If applicable]
```

---

## 🎓 Post-Recovery Checklist

After restoring from disaster:

```
[ ] Unraid array started successfully
[ ] All critical services running
    [ ] NginxProxyManager
    [ ] Cloudflared
    [ ] Vaultwarden
    [ ] Gitea
[ ] Network connectivity verified
    [ ] Can access Unraid WebUI
    [ ] Can ping router (192.168.68.1)
    [ ] Internet working
    [ ] DNS resolving (Pi-hole)
[ ] Vaultwarden accessible (test password retrieval)
[ ] Gitea accessible (verify repositories intact)
[ ] NPM routing working (test reverse proxy)
[ ] Monitoring stack restarted
    [ ] Grafana
    [ ] InfluxDB
    [ ] Telegraf
[ ] External access working
    [ ] Tailscale connected
    [ ] Cloudflare tunnel active
[ ] Backups verified and up-to-date
[ ] Documentation updated with lessons learned
[ ] Incident documented in change log (Gitea)
```

---

## 🔒 Security After Recovery

**Immediately After Disaster Recovery:**

1. **Change Passwords** (if compromise suspected)
   ```
   [ ] Unraid root password
   [ ] Vaultwarden master password
   [ ] Container admin passwords
   [ ] Pi-hole admin password
   [ ] PiKVM password
   ```

2. **Review Access Logs**
   ```bash
   # Check SSH attempts
   grep "Failed password" /var/log/auth.log | tail -50

   # Check NPM access
   docker logs NginxProxyManager | grep -i error

   # Check Gitea access
   docker logs Gitea | grep -i login
   ```

3. **Verify Firewall Rules**
   ```bash
   iptables -L -n -v
   ```

4. **Check for Unauthorized Changes**
   ```bash
   # Review Docker containers
   docker ps -a

   # Check cron jobs
   crontab -l

   # Review network interfaces
   ip addr show
   ```

---

## 📝 Documentation Updates After Incident

**What to Document:**

1. **What Happened:**
   - Date/time of incident
   - Symptoms observed
   - Root cause (if determined)
   - Duration of outage

2. **What You Did:**
   - Steps taken to recover
   - What worked / didn't work
   - Resources used (forums, docs, etc.)
   - Time to recovery

3. **Lessons Learned:**
   - What could prevent this in future
   - Process improvements needed
   - Documentation gaps discovered
   - Backup improvements needed

4. **Action Items:**
   - Backups to implement/improve
   - Monitoring to add
   - Scripts to create
   - Hardware to replace/upgrade

**Where to Document:**
- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md`
- Update this quick-start guide with new procedures
- Add to troubleshooting section if recurring issue
- Commit to Gitea with detailed message

---

## 🚀 Normal Startup Sequence

**From Cold Boot:**

```
1. Power on server
   ↓
2. BIOS POST (~30 seconds)
   - Hardware check
   - Memory test
   - Drive detection
   ↓
3. Unraid boots from USB (~1-2 minutes)
   - Linux kernel loads
   - Unraid OS starts
   ↓
4. Network initializes
   - br0 interface up
   - Gets IP: 192.168.68.51
   ↓
5. Array auto-starts (if configured)
   - Parity disk: sdb
   - Data disk: sdc
   - Cache: nvme1n1p1
   ↓
6. Docker service starts
   - docker0 bridge created
   - Networks initialized
   ↓
7. Containers auto-start (if enabled)
   - Infrastructure services first
   - Then application services
   ↓
8. Services available (~3-5 minutes total)
   ✅ Ready to use!
```

**Expected Boot Time:** 3-5 minutes
**If Taking Longer:** Check system log for errors

---

## 🎯 Quick Health Check Command

**Run After Any Restart:**

```bash
# Quick one-liner health check
docker ps --format "table {{.Names}}\t{{.Status}}" && \
df -h | grep -E "cache|disk1" && \
ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL"
```

---

## 📚 Related Documentation

- **Network Issues:** See `network-map.md`
- **Service Details:** See `service-inventory.md`
- **Container Configs:** See `docker-compose/` (when created)
- **Main Overview:** See `README.md`

---

## 🆘 True Emergency - Complete System Down

**If everything is down and you need immediate help:**

1. **Access via PiKVM**
   - https://192.168.68.53
   - Get console access
   - View what's happening

2. **Check Physical Server**
   - Power LED on?
   - Fans spinning?
   - Drives spinning up?
   - Network activity lights?

3. **Try Safe Mode Boot**
   - Boot Unraid in Safe Mode (GUI mode)
   - Diagnose from console

4. **Community Help**
   - Unraid Discord (fastest response)
   - Forums with diagnostics ZIP
   - r/unraid for quick questions

5. **Document Everything**
   - Take photos/screenshots via PiKVM
   - Note exact error messages
   - Record what you tried
   - Timeline of events

---

## 💡 Pro Tips

1. **Test Your Backups**
   - Restore test annually
   - Verify data integrity
   - Practice recovery procedures

2. **Keep This Guide Accessible**
   - Save offline copy to phone/laptop
   - Print critical sections
   - Bookmark in browser

3. **Automate Where Possible**
   - Schedule backup scripts
   - Set up monitoring alerts
   - Use User Scripts plugin

4. **Document As You Go**
   - Update after fixing issues
   - Add new procedures discovered
   - Note what worked/didn't work

---

**Last Updated:** October 31, 2025
**Next Review:** Quarterly or after incidents
**Maintained By:** Weston

---

**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help!

**Keep this guide accessible even when the server is down!**
💡 **Pro Tip:** Save a copy to your phone/laptop/OneDrive!

🚀 **You've got this!**