Phase 1 Complete: Foundation documentation

Added comprehensive homelab documentation: README.md: - Hardware inventory and specifications - Network architecture overview - Running services catalog - Quick reference commands - Project goals and roadmap docs/network-map.md: - All device IP assignments - Port reference guide - DNS configuration (Pi-hole + Unbound) - Remote access setup (Tailscale + Cloudflare) - Troubleshooting commands docs/service-inventory.md: - All 32 Docker containers cataloged - Running services analysis (6 containers) - Stopped services review (26 containers) - Resource usage and recommendations - Container decision matrix - Cleanup plan to free 40GB - Security recommendations - Prioritized action plan docs/quick-start.md: - Emergency recovery procedures - Service restart sequences - Backup/restore guides with scripts - Troubleshooting by scenario - Health check automation - Post-recovery checklist - Common problem solutions This establishes the foundation for all future homelab projects. Phase 1 documentation complete! 🎉
2025-11-01 00:42:34 +01:00
parent e768ccb902
commit 6cbee11482
5 changed files with 3428 additions and 0 deletions
--- a/docs/service-inventory.md
+++ b/docs/service-inventory.md
@@ -0,0 +1,614 @@
+# 📦 Service Inventory - Complete Container Catalog
+
+**Last Updated:** October 31, 2025  
+**Total Containers:** 32 (6 running, 26 stopped)  
+**Purpose:** Comprehensive catalog of all services
+
+---
+
+## 📊 Quick Stats
+
+| Metric | Value | Status |
+|--------|-------|--------|
+| **Total Containers** | 32 | - |
+| **Running** | 6 | ✅ 19% |
+| **Stopped** | 26 | ⚠️ 81% |
+| **Total Docker Images** | ~50GB | ⚠️ High |
+| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
+
+**Key Insight:** 81% of containers are stopped - cleanup opportunity!
+
+---
+
+## 🟢 Running Services (6 containers)
+
+### 1. open-webui ⭐⭐⭐
+
+**Status:** Running (healthy)  
+**Container:** open-webui  
+**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)  
+**Created:** 2025-10-16 (2 weeks ago)  
+**Network:** bridge (172.17.0.5)  
+**Ports:** 8080 → 3000
+
+**Resources:**
+- CPU: 0.15%
+- Memory: 1.026GB / 60.55GB (1.69%)
+- Storage: 42.4MB
+
+**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
+
+**Dependencies:**
+- ollama (currently STOPPED ❌)
+- OpenAI API key (configured)
+
+**Access:**
+- Local: http://192.168.68.51:3000
+- No authentication by default
+
+**Issues:**
+- ⚠️ Depends on ollama container which is stopped
+- ⚠️ OpenAI API key exposed in environment variables
+
+**Recommendations:**
+1. ✅ **KEEP** - Active LLM interface
+2. Restart ollama container to enable local models
+3. Move API keys to Docker secrets
+4. Enable authentication
+
+**Priority:** HIGH - Core AI/ML service
+
+---
+
+### 2. NginxProxyManager ⭐⭐⭐
+
+**Status:** Running  
+**Container:** NginxProxyManager  
+**Image:** jlesage/nginx-proxy-manager (189MB)  
+**Created:** 2025-10-11 (3 weeks ago)  
+**Network:** bridge (172.17.0.4)  
+**Ports:** 4443→18443, 8080→1880, 8181→7818
+
+**Resources:**
+- CPU: 0.08%
+- Memory: 77.45MB (0.12%)
+- Storage: 13.4KB
+
+**Purpose:** Reverse proxy with web UI - SSL termination and routing
+
+**Dependencies:** None
+
+**Access:**
+- Admin UI: http://192.168.68.51:7818
+- HTTP: http://192.168.68.51:1880
+- HTTPS: https://192.168.68.51:18443
+
+**Configuration:**
+- Routes traffic to backend services
+- Manages SSL certificates
+- Provides access control
+
+**Recommendations:**
+1. ✅ **KEEP** - Critical infrastructure
+2. Document all proxy rules in Gitea
+3. Verify SSL auto-renewal is configured
+4. Enable MFA if available
+5. Review access logs regularly
+
+**Priority:** CRITICAL - Core infrastructure
+
+---
+
+### 3. Gitea ⭐⭐⭐
+
+**Status:** Running  
+**Container:** Gitea  
+**Image:** gitea/gitea (180MB)  
+**Created:** 2025-10-08 (3 weeks ago)  
+**Network:** bridge (172.17.0.3)  
+**Ports:** 22→22, 3000→3002
+
+**Resources:**
+- CPU: 0.11%
+- Memory: 114.5MB (0.18%)
+- Storage: 113MB (active repositories!)
+
+**Purpose:** Self-hosted Git server (GitHub alternative)
+
+**Dependencies:** None (internal SQLite)
+
+**Access:**
+- Web: http://192.168.68.51:3002
+- Domain: https://gitea.segelschiff.app
+- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
+
+**Configuration:**
+- Using latest tag (unpinned version)
+- Storage: /mnt/user/appdata/gitea
+
+**Issues:**
+- ⚠️ SSH port 22 conflicts with Unraid SSH
+- ⚠️ Using `latest` tag (version not pinned)
+- ⚠️ Backup strategy unknown
+
+**Recommendations:**
+1. ✅ **KEEP** - Critical for version control
+2. Change SSH port to 2222 to avoid conflict
+3. Pin to specific version tag
+4. Implement automated backups (CRITICAL!)
+5. This is your version control hub - protect it!
+
+**Priority:** CRITICAL - Infrastructure documentation depends on this
+
+---
+
+### 4. ApacheGuacamole ⭐⭐
+
+**Status:** Running (2+ months uptime!)  
+**Container:** ApacheGuacamole  
+**Image:** jasonbean/guacamole (737MB)  
+**Created:** 2025-08-22 (2+ months ago)  
+**Network:** bridge (172.17.0.2)  
+**Ports:** 8080→4000
+
+**Resources:**
+- CPU: 0.16%
+- Memory: 785.8MB (1.27%)
+- Storage: 46.2MB
+
+**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
+
+**Dependencies:**
+- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
+
+**Access:**
+- Web: http://192.168.68.51:4000
+
+**Configuration:**
+- MySQL enabled but MariaDB stopped
+- Multiple auth modules: MySQL, LDAP, TOTP, etc.
+
+**Issues:**
+- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
+- Currently using embedded database (not recommended)
+- Data loss risk without proper database backend
+
+**Recommendations:**
+1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
+2. If keeping: Start MariaDB and verify connection
+3. If not using: Stop Guacamole and remove both
+4. Document your use case for remote desktop access
+
+**Priority:** MEDIUM - Fix dependency or remove
+
+---
+
+### 5. Cloudflared ⭐⭐⭐
+
+**Status:** Running (2.5+ months - very stable!)  
+**Container:** Unraid-Cloudflared-Tunnel  
+**Image:** figro/unraid-cloudflared-tunnel (8.92MB)  
+**Created:** 2025-08-10 (2.5+ months ago)  
+**Network:** bridge (172.17.0.6)  
+**Ports:** 46495→46495 (metrics)
+
+**Resources:**
+- CPU: 0.33% (highest of running containers)
+- Memory: 68.6MB (0.11%)
+- Network I/O: 41.7MB RX / 310KB TX
+
+**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
+
+**Dependencies:** None
+
+**Access:**
+- Metrics: http://192.168.68.51:46495
+- Domain: *.segelschiff.app (managed via Cloudflare)
+
+**Configuration:**
+- Tunnel token configured
+- No auto-update enabled
+- Metrics exposed for monitoring
+
+**Security:**
+- ⚠️ Tunnel token in plain text environment variable
+- ✅ No open ports on router (excellent!)
+
+**Recommendations:**
+1. ✅ **KEEP** - Excellent security practice
+2. Rotate tunnel token periodically
+3. Document which services are exposed
+4. Integrate metrics with monitoring stack
+
+**Priority:** HIGH - Critical for secure remote access
+
+---
+
+### 6. Vaultwarden ⭐⭐⭐
+
+**Status:** Running (healthy) - 3+ months uptime!  
+**Container:** vaultwarden  
+**Image:** vaultwarden/server (256MB)  
+**Created:** 2025-07-31 (3+ months ago)  
+**Network:** bridge (172.17.0.7)  
+**Ports:** 80→4743
+
+**Resources:**
+- CPU: 0.00% (idle)
+- Memory: 24.96MB (0.04%) - Very lightweight!
+
+**Purpose:** Self-hosted password manager (Bitwarden compatible)
+
+**Dependencies:** None
+
+**Access:**
+- Web: http://192.168.68.51:4743
+- Admin: http://192.168.68.51:4743/admin
+
+**Configuration:**
+- Signups allowed: true ⚠️
+- Invitations allowed: false ✅
+- WebSocket disabled ⚠️
+- Admin token exposed ⚠️
+
+**Issues:**
+- 🚨 **CRITICAL:** No backup strategy evident!
+- ⚠️ Admin token in plain text
+- ⚠️ Signups open (verify intentional)
+- ⚠️ WebSocket disabled (reduces functionality)
+
+**Recommendations:**
+1. ✅ **KEEP** - Critical security infrastructure
+2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
+3. Close signups after initial setup
+4. Rotate admin token and use secrets management
+5. Enable WebSocket for better sync
+6. Automate daily backups to off-site location
+
+**Priority:** CRITICAL - Contains all your passwords!
+
+---
+
+## 🔴 Recently Stopped Services (Worth Investigating)
+
+### 7. ollama ⚠️
+
+**Status:** Exited (128) 4 minutes ago  
+**Image:** ollama/ollama (3.33GB)  
+**Purpose:** Local LLM inference engine
+
+**Why It Matters:** open-webui depends on this!
+
+**Recommendations:**
+1. 🔧 **RESTART** - Required for open-webui local models
+2. Investigate exit code 128 (configuration issue?)
+3. Configure GPU acceleration (RTX 4090!)
+4. Test with open-webui after restart
+
+**Action:** `docker start ollama && docker logs -f ollama`
+
+---
+
+### 8. Monitoring Stack (Stopped 12 days ago) 🚨
+
+**Containers:**
+- Grafana (stopped 12 days)
+- InfluxDB (stopped 12 days)
+- Telegraf (stopped 12 days)
+
+**Total Size:** ~1.7GB
+
+**Why Critical:** Zero observability into system health!
+
+**Recommendations:**
+1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
+2. Configure dashboards for:
+   - Docker container stats
+   - System resources (CPU, RAM, disk)
+   - Network traffic
+   - Temperature sensors
+3. Set up alerting for critical issues
+4. Document in runbook
+
+**Action:**
+```bash
+docker start Influxdb
+sleep 15  # Wait for DB initialization
+docker start Telegraf
+docker start Grafana
+```
+
+---
+
+### 9. MariaDB (Stopped 12 days ago) ⚠️
+
+**Status:** Exited (0) 12 days ago  
+**Image:** lscr.io/linuxserver/mariadb (348MB)  
+**Purpose:** MySQL database for Guacamole
+
+**Issue:** Guacamole is running but database is stopped!
+
+**Recommendations:**
+1. If using Guacamole: **RESTART**
+2. If not using Guacamole: **REMOVE BOTH**
+3. Document decision
+
+---
+
+### 10. Database Admin Tools (Stopped 12 days ago)
+
+**CloudBeaver** - Stopped 12 days  
+**adminer** - Stopped 12 days
+
+**Issue:** Two database admin tools - redundant!
+
+**Recommendations:**
+1. **CHOOSE ONE:**
+   - CloudBeaver: Feature-rich (725MB)
+   - adminer: Lightweight (118MB)
+2. Remove the other
+3. Only restart if you need database management
+
+---
+
+## 🟡 Experimental / Inactive Services (Decision Needed)
+
+### 11. Nextcloud AIO Stack (7 containers!) 🚨
+
+**Status:** All stopped 3 weeks ago  
+**Total Size:** ~7GB Docker images + data  
+**Containers:**
+- nextcloud-aio-mastercontainer
+- nextcloud-aio-apache
+- nextcloud-aio-nextcloud (2.19GB)
+- nextcloud-aio-database (PostgreSQL)
+- nextcloud-aio-redis
+- nextcloud-aio-onlyoffice (3.79GB!)
+- nextcloud-aio-imaginary
+- nextcloud-aio-notify-push
+
+**Data:** /mnt/user/nextcloud (~1GB+)
+
+**Analysis:**
+- Massive resource footprint
+- "All-in-One" = heavy coupling
+- Stopped for 3 weeks suggests not critical
+
+**Recommendations:**
+**DECISION REQUIRED:**
+
+**Option A: Remove Everything**
+```bash
+# Backup data first!
+cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
+
+# Remove containers
+docker rm nextcloud-aio-*
+
+# Remove images to free space
+docker rmi $(docker images | grep nextcloud | awk '{print $3}')
+
+# Archive data
+tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
+```
+**Saves:** ~7GB+ space
+
+**Option B: Keep and Restart**
+- Document why you need it
+- Create restart procedure
+- Implement backup strategy
+- Monitor resource usage
+
+**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
+
+---
+
+### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
+
+**Status:** Exited (0) 2 weeks ago  
+**Image:** jellyfin/jellyfin (1.25GB)  
+**GPU:** RTX 4090 allocated but idle!
+
+**Media:**
+- Movies: /mnt/user/movies
+- TV: /mnt/user/tv shows
+- Music: /mnt/user/music
+
+**Issue:** $1600 GPU sitting idle!
+
+**Recommendations:**
+**If you want media server:**
+1. **RESTART** with hardware transcoding:
+   ```bash
+   docker start Jellyfin
+   ```
+2. Configure NVENC/NVDEC for RTX 4090
+3. Test 4K transcoding performance
+4. Switch from `host` network to bridge (security)
+
+**If you don't need media server:**
+1. Remove GPU allocation from container
+2. Free GPU for other projects (AI/ML)
+
+**Action Required:** Decide on media server strategy
+
+---
+
+### 13. Large AI/ML Containers (Rarely Used)
+
+**ebook2audiobook** - 20.06GB! (stopped 3 weeks)  
+**docling-serve** - 14.45GB! (stopped 2 weeks)
+
+**Total:** 34.5GB for two containers!
+
+**Analysis:**
+- Massive images
+- Rarely used (stopped weeks ago)
+- Experimental/one-time use?
+
+**Recommendations:**
+1. **REMOVE** both to free 34.5GB
+2. If needed again, pull fresh images
+3. Document use cases if keeping
+
+**Potential Savings:** 34.5GB cache space!
+
+---
+
+### 14. Productivity Suite (Multiple Stopped)
+
+**baserow** - Stopped 2 weeks (2.25GB)  
+**NocoDB** - Stopped 3 weeks (588MB)  
+**OpenProject** - Stopped 7 weeks (2.87GB)
+
+**Issue:** Three project management tools - redundant!
+
+**Recommendations:**
+1. **CHOOSE ONE** (or none if not used)
+2. Remove the others
+3. Migrate data if needed first
+
+**Potential Savings:** ~5GB
+
+---
+
+### 15. Development Tools
+
+**n8n** (workflow automation) - Created but never started  
+**steam-headless** - Created but not running
+
+**Recommendations:**
+- Document if you have plans for these
+- Remove if experimental and abandoned
+
+---
+
+## 📋 Container Decision Matrix
+
+| Container | Keep? | Action | Priority |
+|-----------|-------|--------|----------|
+| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
+| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
+| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
+| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
+| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
+| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
+| **ollama** | ✅ Yes | Restart immediately | HIGH |
+| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
+| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
+| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
+| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
+| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
+| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
+| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
+| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
+
+---
+
+## 🎯 Recommended Action Plan
+
+### Phase 1: Critical (Do First!) 🚨
+
+1. **Backup Vaultwarden** (30 min)
+   ```bash
+   docker stop vaultwarden
+   tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
+   docker start vaultwarden
+   ```
+
+2. **Backup Gitea** (30 min)
+   ```bash
+   docker stop Gitea
+   tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
+   docker start Gitea
+   ```
+
+3. **Restart Monitoring Stack** (15 min)
+   ```bash
+   docker start Influxdb && sleep 15
+   docker start Telegraf Grafana
+   # Configure dashboards
+   ```
+
+4. **Restart ollama** (5 min)
+   ```bash
+   docker start ollama
+   docker logs -f ollama
+   ```
+
+### Phase 2: Cleanup (Free Space!) 💾
+
+5. **Remove Large Unused Containers** (1 hour)
+   - ebook2audiobook (20GB)
+   - docling-serve (14.5GB)
+   - Nextcloud AIO stack (7GB)
+   - **Saves: ~41GB!**
+
+6. **Docker System Cleanup**
+   ```bash
+   docker system prune -a
+   # Free unused images and build cache
+   ```
+
+### Phase 3: Decisions (This Week)
+
+7. **Guacamole + MariaDB** - Keep or remove?
+8. **Jellyfin** - Restart with GPU or remove?
+9. **Productivity tools** - Choose one, remove others
+10. **Database admin** - CloudBeaver or adminer?
+
+---
+
+## 📊 Storage Cleanup Impact
+
+**Current Cache Usage:** 578GB / 932GB (63%)
+
+**After Recommended Cleanup:**
+- Remove ebook2audiobook: -20GB
+- Remove docling-serve: -14.5GB
+- Remove Nextcloud AIO: -7GB
+- Docker system prune: ~10-20GB
+- **Total Freed: ~50-60GB**
+
+**New Cache Usage:** ~520GB / 932GB (56%) ✅
+
+---
+
+## 🔐 Security Recommendations
+
+1. **Secrets Management** - Stop using plain text env vars
+2. **Close Open Signups** - Vaultwarden signups should be closed
+3. **SSH Port Conflict** - Fix Gitea port 22 conflict
+4. **Network Mode** - Move Jellyfin from `host` to `bridge`
+5. **Version Pinning** - Stop using `latest` tags
+
+---
+
+## 📈 Resource Summary
+
+**Docker Images Total:** ~50GB  
+**Container Data:** Varies by appdata  
+**Cache Impact:** High (63% full)
+
+**Top Resource Consumers (Images):**
+1. ebook2audiobook: 20.06GB
+2. docling-serve: 14.45GB
+3. Nextcloud stack: ~7GB
+4. open-webui: 4.55GB
+5. OpenProject: 2.87GB
+
+---
+
+## 🎓 Key Takeaways
+
+1. **6 services are your core** - Keep these running
+2. **26 stopped containers** - Cleanup opportunity
+3. **~40GB can be freed** - Significant space available
+4. **No monitoring** - Critical gap (restart Grafana stack!)
+5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
+
+---
+
+**Last Updated:** October 31, 2025  
+**Next Review:** After cleanup actions completed  
+**Maintained By:** Weston