Phase 1 Complete: Foundation documentation

Added comprehensive homelab documentation:

README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap

docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands

docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan

docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions

This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉
This commit is contained in:
2025-11-01 00:42:34 +01:00
parent e768ccb902
commit 6cbee11482
5 changed files with 3428 additions and 0 deletions

614
docs/service-inventory.md Normal file
View File

@@ -0,0 +1,614 @@
# 📦 Service Inventory - Complete Container Catalog
**Last Updated:** October 31, 2025
**Total Containers:** 32 (6 running, 26 stopped)
**Purpose:** Comprehensive catalog of all services
---
## 📊 Quick Stats
| Metric | Value | Status |
|--------|-------|--------|
| **Total Containers** | 32 | - |
| **Running** | 6 | ✅ 19% |
| **Stopped** | 26 | ⚠️ 81% |
| **Total Docker Images** | ~50GB | ⚠️ High |
| **Cache Usage** | 578GB / 932GB | ⚠️ 63% |
**Key Insight:** 81% of containers are stopped - cleanup opportunity!
---
## 🟢 Running Services (6 containers)
### 1. open-webui ⭐⭐⭐
**Status:** Running (healthy)
**Container:** open-webui
**Image:** ghcr.io/open-webui/open-webui:main (4.55GB)
**Created:** 2025-10-16 (2 weeks ago)
**Network:** bridge (172.17.0.5)
**Ports:** 8080 → 3000
**Resources:**
- CPU: 0.15%
- Memory: 1.026GB / 60.55GB (1.69%)
- Storage: 42.4MB
**Purpose:** LLM chat interface (ChatGPT-like UI for local models)
**Dependencies:**
- ollama (currently STOPPED ❌)
- OpenAI API key (configured)
**Access:**
- Local: http://192.168.68.51:3000
- No authentication by default
**Issues:**
- ⚠️ Depends on ollama container which is stopped
- ⚠️ OpenAI API key exposed in environment variables
**Recommendations:**
1.**KEEP** - Active LLM interface
2. Restart ollama container to enable local models
3. Move API keys to Docker secrets
4. Enable authentication
**Priority:** HIGH - Core AI/ML service
---
### 2. NginxProxyManager ⭐⭐⭐
**Status:** Running
**Container:** NginxProxyManager
**Image:** jlesage/nginx-proxy-manager (189MB)
**Created:** 2025-10-11 (3 weeks ago)
**Network:** bridge (172.17.0.4)
**Ports:** 4443→18443, 8080→1880, 8181→7818
**Resources:**
- CPU: 0.08%
- Memory: 77.45MB (0.12%)
- Storage: 13.4KB
**Purpose:** Reverse proxy with web UI - SSL termination and routing
**Dependencies:** None
**Access:**
- Admin UI: http://192.168.68.51:7818
- HTTP: http://192.168.68.51:1880
- HTTPS: https://192.168.68.51:18443
**Configuration:**
- Routes traffic to backend services
- Manages SSL certificates
- Provides access control
**Recommendations:**
1.**KEEP** - Critical infrastructure
2. Document all proxy rules in Gitea
3. Verify SSL auto-renewal is configured
4. Enable MFA if available
5. Review access logs regularly
**Priority:** CRITICAL - Core infrastructure
---
### 3. Gitea ⭐⭐⭐
**Status:** Running
**Container:** Gitea
**Image:** gitea/gitea (180MB)
**Created:** 2025-10-08 (3 weeks ago)
**Network:** bridge (172.17.0.3)
**Ports:** 22→22, 3000→3002
**Resources:**
- CPU: 0.11%
- Memory: 114.5MB (0.18%)
- Storage: 113MB (active repositories!)
**Purpose:** Self-hosted Git server (GitHub alternative)
**Dependencies:** None (internal SQLite)
**Access:**
- Web: http://192.168.68.51:3002
- Domain: https://gitea.segelschiff.app
- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)
**Configuration:**
- Using latest tag (unpinned version)
- Storage: /mnt/user/appdata/gitea
**Issues:**
- ⚠️ SSH port 22 conflicts with Unraid SSH
- ⚠️ Using `latest` tag (version not pinned)
- ⚠️ Backup strategy unknown
**Recommendations:**
1.**KEEP** - Critical for version control
2. Change SSH port to 2222 to avoid conflict
3. Pin to specific version tag
4. Implement automated backups (CRITICAL!)
5. This is your version control hub - protect it!
**Priority:** CRITICAL - Infrastructure documentation depends on this
---
### 4. ApacheGuacamole ⭐⭐
**Status:** Running (2+ months uptime!)
**Container:** ApacheGuacamole
**Image:** jasonbean/guacamole (737MB)
**Created:** 2025-08-22 (2+ months ago)
**Network:** bridge (172.17.0.2)
**Ports:** 8080→4000
**Resources:**
- CPU: 0.16%
- Memory: 785.8MB (1.27%)
- Storage: 46.2MB
**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser)
**Dependencies:**
- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!**
**Access:**
- Web: http://192.168.68.51:4000
**Configuration:**
- MySQL enabled but MariaDB stopped
- Multiple auth modules: MySQL, LDAP, TOTP, etc.
**Issues:**
- 🚨 **CRITICAL:** Depends on MariaDB which is stopped!
- Currently using embedded database (not recommended)
- Data loss risk without proper database backend
**Recommendations:**
1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure
2. If keeping: Start MariaDB and verify connection
3. If not using: Stop Guacamole and remove both
4. Document your use case for remote desktop access
**Priority:** MEDIUM - Fix dependency or remove
---
### 5. Cloudflared ⭐⭐⭐
**Status:** Running (2.5+ months - very stable!)
**Container:** Unraid-Cloudflared-Tunnel
**Image:** figro/unraid-cloudflared-tunnel (8.92MB)
**Created:** 2025-08-10 (2.5+ months ago)
**Network:** bridge (172.17.0.6)
**Ports:** 46495→46495 (metrics)
**Resources:**
- CPU: 0.33% (highest of running containers)
- Memory: 68.6MB (0.11%)
- Network I/O: 41.7MB RX / 310KB TX
**Purpose:** Cloudflare Tunnel - secure external access without port forwarding
**Dependencies:** None
**Access:**
- Metrics: http://192.168.68.51:46495
- Domain: *.segelschiff.app (managed via Cloudflare)
**Configuration:**
- Tunnel token configured
- No auto-update enabled
- Metrics exposed for monitoring
**Security:**
- ⚠️ Tunnel token in plain text environment variable
- ✅ No open ports on router (excellent!)
**Recommendations:**
1.**KEEP** - Excellent security practice
2. Rotate tunnel token periodically
3. Document which services are exposed
4. Integrate metrics with monitoring stack
**Priority:** HIGH - Critical for secure remote access
---
### 6. Vaultwarden ⭐⭐⭐
**Status:** Running (healthy) - 3+ months uptime!
**Container:** vaultwarden
**Image:** vaultwarden/server (256MB)
**Created:** 2025-07-31 (3+ months ago)
**Network:** bridge (172.17.0.7)
**Ports:** 80→4743
**Resources:**
- CPU: 0.00% (idle)
- Memory: 24.96MB (0.04%) - Very lightweight!
**Purpose:** Self-hosted password manager (Bitwarden compatible)
**Dependencies:** None
**Access:**
- Web: http://192.168.68.51:4743
- Admin: http://192.168.68.51:4743/admin
**Configuration:**
- Signups allowed: true ⚠️
- Invitations allowed: false ✅
- WebSocket disabled ⚠️
- Admin token exposed ⚠️
**Issues:**
- 🚨 **CRITICAL:** No backup strategy evident!
- ⚠️ Admin token in plain text
- ⚠️ Signups open (verify intentional)
- ⚠️ WebSocket disabled (reduces functionality)
**Recommendations:**
1.**KEEP** - Critical security infrastructure
2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault!
3. Close signups after initial setup
4. Rotate admin token and use secrets management
5. Enable WebSocket for better sync
6. Automate daily backups to off-site location
**Priority:** CRITICAL - Contains all your passwords!
---
## 🔴 Recently Stopped Services (Worth Investigating)
### 7. ollama ⚠️
**Status:** Exited (128) 4 minutes ago
**Image:** ollama/ollama (3.33GB)
**Purpose:** Local LLM inference engine
**Why It Matters:** open-webui depends on this!
**Recommendations:**
1. 🔧 **RESTART** - Required for open-webui local models
2. Investigate exit code 128 (configuration issue?)
3. Configure GPU acceleration (RTX 4090!)
4. Test with open-webui after restart
**Action:** `docker start ollama && docker logs -f ollama`
---
### 8. Monitoring Stack (Stopped 12 days ago) 🚨
**Containers:**
- Grafana (stopped 12 days)
- InfluxDB (stopped 12 days)
- Telegraf (stopped 12 days)
**Total Size:** ~1.7GB
**Why Critical:** Zero observability into system health!
**Recommendations:**
1. 🚨 **RESTART IMMEDIATELY** - Priority 1!
2. Configure dashboards for:
- Docker container stats
- System resources (CPU, RAM, disk)
- Network traffic
- Temperature sensors
3. Set up alerting for critical issues
4. Document in runbook
**Action:**
```bash
docker start Influxdb
sleep 15 # Wait for DB initialization
docker start Telegraf
docker start Grafana
```
---
### 9. MariaDB (Stopped 12 days ago) ⚠️
**Status:** Exited (0) 12 days ago
**Image:** lscr.io/linuxserver/mariadb (348MB)
**Purpose:** MySQL database for Guacamole
**Issue:** Guacamole is running but database is stopped!
**Recommendations:**
1. If using Guacamole: **RESTART**
2. If not using Guacamole: **REMOVE BOTH**
3. Document decision
---
### 10. Database Admin Tools (Stopped 12 days ago)
**CloudBeaver** - Stopped 12 days
**adminer** - Stopped 12 days
**Issue:** Two database admin tools - redundant!
**Recommendations:**
1. **CHOOSE ONE:**
- CloudBeaver: Feature-rich (725MB)
- adminer: Lightweight (118MB)
2. Remove the other
3. Only restart if you need database management
---
## 🟡 Experimental / Inactive Services (Decision Needed)
### 11. Nextcloud AIO Stack (7 containers!) 🚨
**Status:** All stopped 3 weeks ago
**Total Size:** ~7GB Docker images + data
**Containers:**
- nextcloud-aio-mastercontainer
- nextcloud-aio-apache
- nextcloud-aio-nextcloud (2.19GB)
- nextcloud-aio-database (PostgreSQL)
- nextcloud-aio-redis
- nextcloud-aio-onlyoffice (3.79GB!)
- nextcloud-aio-imaginary
- nextcloud-aio-notify-push
**Data:** /mnt/user/nextcloud (~1GB+)
**Analysis:**
- Massive resource footprint
- "All-in-One" = heavy coupling
- Stopped for 3 weeks suggests not critical
**Recommendations:**
**DECISION REQUIRED:**
**Option A: Remove Everything**
```bash
# Backup data first!
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)
# Remove containers
docker rm nextcloud-aio-*
# Remove images to free space
docker rmi $(docker images | grep nextcloud | awk '{print $3}')
# Archive data
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud
```
**Saves:** ~7GB+ space
**Option B: Keep and Restart**
- Document why you need it
- Create restart procedure
- Implement backup strategy
- Monitor resource usage
**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.
---
### 12. Jellyfin (Stopped 2 weeks ago) ⚠️
**Status:** Exited (0) 2 weeks ago
**Image:** jellyfin/jellyfin (1.25GB)
**GPU:** RTX 4090 allocated but idle!
**Media:**
- Movies: /mnt/user/movies
- TV: /mnt/user/tv shows
- Music: /mnt/user/music
**Issue:** $1600 GPU sitting idle!
**Recommendations:**
**If you want media server:**
1. **RESTART** with hardware transcoding:
```bash
docker start Jellyfin
```
2. Configure NVENC/NVDEC for RTX 4090
3. Test 4K transcoding performance
4. Switch from `host` network to bridge (security)
**If you don't need media server:**
1. Remove GPU allocation from container
2. Free GPU for other projects (AI/ML)
**Action Required:** Decide on media server strategy
---
### 13. Large AI/ML Containers (Rarely Used)
**ebook2audiobook** - 20.06GB! (stopped 3 weeks)
**docling-serve** - 14.45GB! (stopped 2 weeks)
**Total:** 34.5GB for two containers!
**Analysis:**
- Massive images
- Rarely used (stopped weeks ago)
- Experimental/one-time use?
**Recommendations:**
1. **REMOVE** both to free 34.5GB
2. If needed again, pull fresh images
3. Document use cases if keeping
**Potential Savings:** 34.5GB cache space!
---
### 14. Productivity Suite (Multiple Stopped)
**baserow** - Stopped 2 weeks (2.25GB)
**NocoDB** - Stopped 3 weeks (588MB)
**OpenProject** - Stopped 7 weeks (2.87GB)
**Issue:** Three project management tools - redundant!
**Recommendations:**
1. **CHOOSE ONE** (or none if not used)
2. Remove the others
3. Migrate data if needed first
**Potential Savings:** ~5GB
---
### 15. Development Tools
**n8n** (workflow automation) - Created but never started
**steam-headless** - Created but not running
**Recommendations:**
- Document if you have plans for these
- Remove if experimental and abandoned
---
## 📋 Container Decision Matrix
| Container | Keep? | Action | Priority |
|-----------|-------|--------|----------|
| **open-webui** | ✅ Yes | Keep running, restart ollama | HIGH |
| **NginxProxyManager** | ✅ Yes | Keep, document configs | CRITICAL |
| **Gitea** | ✅ Yes | Keep, fix SSH port, backup | CRITICAL |
| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM |
| **Cloudflared** | ✅ Yes | Keep, rotate token | HIGH |
| **Vaultwarden** | ✅ Yes | Keep, BACKUP NOW! | CRITICAL |
| **ollama** | ✅ Yes | Restart immediately | HIGH |
| **Monitoring Stack** | ✅ Yes | Restart all 3 containers | CRITICAL |
| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM |
| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW |
| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM |
| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW |
| **docling-serve** | ❌ Remove | Free 14.5GB | LOW |
| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW |
| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW |
---
## 🎯 Recommended Action Plan
### Phase 1: Critical (Do First!) 🚨
1. **Backup Vaultwarden** (30 min)
```bash
docker stop vaultwarden
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
docker start vaultwarden
```
2. **Backup Gitea** (30 min)
```bash
docker stop Gitea
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
docker start Gitea
```
3. **Restart Monitoring Stack** (15 min)
```bash
docker start Influxdb && sleep 15
docker start Telegraf Grafana
# Configure dashboards
```
4. **Restart ollama** (5 min)
```bash
docker start ollama
docker logs -f ollama
```
### Phase 2: Cleanup (Free Space!) 💾
5. **Remove Large Unused Containers** (1 hour)
- ebook2audiobook (20GB)
- docling-serve (14.5GB)
- Nextcloud AIO stack (7GB)
- **Saves: ~41GB!**
6. **Docker System Cleanup**
```bash
docker system prune -a
# Free unused images and build cache
```
### Phase 3: Decisions (This Week)
7. **Guacamole + MariaDB** - Keep or remove?
8. **Jellyfin** - Restart with GPU or remove?
9. **Productivity tools** - Choose one, remove others
10. **Database admin** - CloudBeaver or adminer?
---
## 📊 Storage Cleanup Impact
**Current Cache Usage:** 578GB / 932GB (63%)
**After Recommended Cleanup:**
- Remove ebook2audiobook: -20GB
- Remove docling-serve: -14.5GB
- Remove Nextcloud AIO: -7GB
- Docker system prune: ~10-20GB
- **Total Freed: ~50-60GB**
**New Cache Usage:** ~520GB / 932GB (56%) ✅
---
## 🔐 Security Recommendations
1. **Secrets Management** - Stop using plain text env vars
2. **Close Open Signups** - Vaultwarden signups should be closed
3. **SSH Port Conflict** - Fix Gitea port 22 conflict
4. **Network Mode** - Move Jellyfin from `host` to `bridge`
5. **Version Pinning** - Stop using `latest` tags
---
## 📈 Resource Summary
**Docker Images Total:** ~50GB
**Container Data:** Varies by appdata
**Cache Impact:** High (63% full)
**Top Resource Consumers (Images):**
1. ebook2audiobook: 20.06GB
2. docling-serve: 14.45GB
3. Nextcloud stack: ~7GB
4. open-webui: 4.55GB
5. OpenProject: 2.87GB
---
## 🎓 Key Takeaways
1. **6 services are your core** - Keep these running
2. **26 stopped containers** - Cleanup opportunity
3. **~40GB can be freed** - Significant space available
4. **No monitoring** - Critical gap (restart Grafana stack!)
5. **Backup critical** - Vaultwarden and Gitea MUST be backed up
---
**Last Updated:** October 31, 2025
**Next Review:** After cleanup actions completed
**Maintained By:** Weston