Files

weston 6cbee11482 Phase 1 Complete: Foundation documentation

Added comprehensive homelab documentation:

README.md:
- Hardware inventory and specifications
- Network architecture overview
- Running services catalog
- Quick reference commands
- Project goals and roadmap

docs/network-map.md:
- All device IP assignments
- Port reference guide
- DNS configuration (Pi-hole + Unbound)
- Remote access setup (Tailscale + Cloudflare)
- Troubleshooting commands

docs/service-inventory.md:
- All 32 Docker containers cataloged
- Running services analysis (6 containers)
- Stopped services review (26 containers)
- Resource usage and recommendations
- Container decision matrix
- Cleanup plan to free 40GB
- Security recommendations
- Prioritized action plan

docs/quick-start.md:
- Emergency recovery procedures
- Service restart sequences
- Backup/restore guides with scripts
- Troubleshooting by scenario
- Health check automation
- Post-recovery checklist
- Common problem solutions

This establishes the foundation for all future homelab projects.
Phase 1 documentation complete! 🎉

2025-11-01 00:42:34 +01:00

15 KiB

Raw Permalink Blame History

📦 Service Inventory - Complete Container Catalog

Last Updated: October 31, 2025
Total Containers: 32 (6 running, 26 stopped)
Purpose: Comprehensive catalog of all services

📊 Quick Stats

Metric	Value	Status
Total Containers	32	-
Running	6	✅ 19%
Stopped	26	⚠️ 81%
Total Docker Images	~50GB	⚠️ High
Cache Usage	578GB / 932GB	⚠️ 63%

Key Insight: 81% of containers are stopped - cleanup opportunity!

🟢 Running Services (6 containers)

1. open-webui ⭐⭐⭐

Status: Running (healthy)
Container: open-webui
Image: ghcr.io/open-webui/open-webui:main (4.55GB)
Created: 2025-10-16 (2 weeks ago)
Network: bridge (172.17.0.5)
Ports: 8080 → 3000

Resources:

CPU: 0.15%
Memory: 1.026GB / 60.55GB (1.69%)
Storage: 42.4MB

Purpose: LLM chat interface (ChatGPT-like UI for local models)

Dependencies:

ollama (currently STOPPED ❌)
OpenAI API key (configured)

Access:

Local: http://192.168.68.51:3000
No authentication by default

Issues:

⚠️ Depends on ollama container which is stopped
⚠️ OpenAI API key exposed in environment variables

Recommendations:

✅ KEEP - Active LLM interface
Restart ollama container to enable local models
Move API keys to Docker secrets
Enable authentication

Priority: HIGH - Core AI/ML service

2. NginxProxyManager ⭐⭐⭐

Status: Running
Container: NginxProxyManager
Image: jlesage/nginx-proxy-manager (189MB)
Created: 2025-10-11 (3 weeks ago)
Network: bridge (172.17.0.4)
Ports: 4443→18443, 8080→1880, 8181→7818

Resources:

CPU: 0.08%
Memory: 77.45MB (0.12%)
Storage: 13.4KB

Purpose: Reverse proxy with web UI - SSL termination and routing

Dependencies: None

Access:

Configuration:

Routes traffic to backend services
Manages SSL certificates
Provides access control

Recommendations:

✅ KEEP - Critical infrastructure
Document all proxy rules in Gitea
Verify SSL auto-renewal is configured
Enable MFA if available
Review access logs regularly

Priority: CRITICAL - Core infrastructure

3. Gitea ⭐⭐⭐

Status: Running
Container: Gitea
Image: gitea/gitea (180MB)
Created: 2025-10-08 (3 weeks ago)
Network: bridge (172.17.0.3)
Ports: 22→22, 3000→3002

Resources:

CPU: 0.11%
Memory: 114.5MB (0.18%)
Storage: 113MB (active repositories!)

Purpose: Self-hosted Git server (GitHub alternative)

Dependencies: None (internal SQLite)

Access:

Web: http://192.168.68.51:3002
Domain: https://gitea.segelschiff.app
SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH)

Configuration:

Using latest tag (unpinned version)
Storage: /mnt/user/appdata/gitea

Issues:

⚠️ SSH port 22 conflicts with Unraid SSH
⚠️ Using latest tag (version not pinned)
⚠️ Backup strategy unknown

Recommendations:

✅ KEEP - Critical for version control
Change SSH port to 2222 to avoid conflict
Pin to specific version tag
Implement automated backups (CRITICAL!)
This is your version control hub - protect it!

Priority: CRITICAL - Infrastructure documentation depends on this

4. ApacheGuacamole ⭐⭐

Status: Running (2+ months uptime!)
Container: ApacheGuacamole
Image: jasonbean/guacamole (737MB)
Created: 2025-08-22 (2+ months ago)
Network: bridge (172.17.0.2)
Ports: 8080→4000

Resources:

CPU: 0.16%
Memory: 785.8MB (1.27%)
Storage: 46.2MB

Purpose: Clientless remote desktop gateway (RDP/VNC/SSH via browser)

Dependencies:

MariaDB (STOPPED ❌) - BROKEN DEPENDENCY!

Access:

Web: http://192.168.68.51:4000

Configuration:

MySQL enabled but MariaDB stopped
Multiple auth modules: MySQL, LDAP, TOTP, etc.

Issues:

🚨 CRITICAL: Depends on MariaDB which is stopped!
Currently using embedded database (not recommended)
Data loss risk without proper database backend

Recommendations:

⚠️ FIX IMMEDIATELY - Restart MariaDB or reconfigure
If keeping: Start MariaDB and verify connection
If not using: Stop Guacamole and remove both
Document your use case for remote desktop access

Priority: MEDIUM - Fix dependency or remove

5. Cloudflared ⭐⭐⭐

Status: Running (2.5+ months - very stable!)
Container: Unraid-Cloudflared-Tunnel
Image: figro/unraid-cloudflared-tunnel (8.92MB)
Created: 2025-08-10 (2.5+ months ago)
Network: bridge (172.17.0.6)
Ports: 46495→46495 (metrics)

Resources:

CPU: 0.33% (highest of running containers)
Memory: 68.6MB (0.11%)
Network I/O: 41.7MB RX / 310KB TX

Purpose: Cloudflare Tunnel - secure external access without port forwarding

Dependencies: None

Access:

Metrics: http://192.168.68.51:46495
Domain: *.segelschiff.app (managed via Cloudflare)

Configuration:

Tunnel token configured
No auto-update enabled
Metrics exposed for monitoring

Security:

⚠️ Tunnel token in plain text environment variable
✅ No open ports on router (excellent!)

Recommendations:

✅ KEEP - Excellent security practice
Rotate tunnel token periodically
Document which services are exposed
Integrate metrics with monitoring stack

Priority: HIGH - Critical for secure remote access

6. Vaultwarden ⭐⭐⭐

Status: Running (healthy) - 3+ months uptime!
Container: vaultwarden
Image: vaultwarden/server (256MB)
Created: 2025-07-31 (3+ months ago)
Network: bridge (172.17.0.7)
Ports: 80→4743

Resources:

CPU: 0.00% (idle)
Memory: 24.96MB (0.04%) - Very lightweight!

Purpose: Self-hosted password manager (Bitwarden compatible)

Dependencies: None

Access:

Web: http://192.168.68.51:4743
Admin: http://192.168.68.51:4743/admin

Configuration:

Signups allowed: true ⚠️
Invitations allowed: false ✅
WebSocket disabled ⚠️
Admin token exposed ⚠️

Issues:

🚨 CRITICAL: No backup strategy evident!
⚠️ Admin token in plain text
⚠️ Signups open (verify intentional)
⚠️ WebSocket disabled (reduces functionality)

Recommendations:

✅ KEEP - Critical security infrastructure
🚨 IMPLEMENT BACKUP IMMEDIATELY - This is your password vault!
Close signups after initial setup
Rotate admin token and use secrets management
Enable WebSocket for better sync
Automate daily backups to off-site location

Priority: CRITICAL - Contains all your passwords!

🔴 Recently Stopped Services (Worth Investigating)

7. ollama ⚠️

Status: Exited (128) 4 minutes ago
Image: ollama/ollama (3.33GB)
Purpose: Local LLM inference engine

Why It Matters: open-webui depends on this!

Recommendations:

🔧 RESTART - Required for open-webui local models
Investigate exit code 128 (configuration issue?)
Configure GPU acceleration (RTX 4090!)
Test with open-webui after restart

Action: docker start ollama && docker logs -f ollama

8. Monitoring Stack (Stopped 12 days ago) 🚨

Containers:

Grafana (stopped 12 days)
InfluxDB (stopped 12 days)
Telegraf (stopped 12 days)

Total Size: ~1.7GB

Why Critical: Zero observability into system health!

Recommendations:

🚨 RESTART IMMEDIATELY - Priority 1!
Configure dashboards for:
- Docker container stats
- System resources (CPU, RAM, disk)
- Network traffic
- Temperature sensors
Set up alerting for critical issues
Document in runbook

Action:

docker start Influxdb
sleep 15  # Wait for DB initialization
docker start Telegraf
docker start Grafana

9. MariaDB (Stopped 12 days ago) ⚠️

Status: Exited (0) 12 days ago
Image: lscr.io/linuxserver/mariadb (348MB)
Purpose: MySQL database for Guacamole

Issue: Guacamole is running but database is stopped!

Recommendations:

If using Guacamole: RESTART
If not using Guacamole: REMOVE BOTH
Document decision

10. Database Admin Tools (Stopped 12 days ago)

CloudBeaver - Stopped 12 days
adminer - Stopped 12 days

Issue: Two database admin tools - redundant!

Recommendations:

CHOOSE ONE:
- CloudBeaver: Feature-rich (725MB)
- adminer: Lightweight (118MB)
Remove the other
Only restart if you need database management

🟡 Experimental / Inactive Services (Decision Needed)

11. Nextcloud AIO Stack (7 containers!) 🚨

Status: All stopped 3 weeks ago
Total Size: ~7GB Docker images + data
Containers:

nextcloud-aio-mastercontainer
nextcloud-aio-apache
nextcloud-aio-nextcloud (2.19GB)
nextcloud-aio-database (PostgreSQL)
nextcloud-aio-redis
nextcloud-aio-onlyoffice (3.79GB!)
nextcloud-aio-imaginary
nextcloud-aio-notify-push

Data: /mnt/user/nextcloud (~1GB+)

Analysis:

Massive resource footprint
"All-in-One" = heavy coupling
Stopped for 3 weeks suggests not critical

Recommendations: DECISION REQUIRED:

Option A: Remove Everything

# Backup data first!
cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d)

# Remove containers
docker rm nextcloud-aio-*

# Remove images to free space
docker rmi $(docker images | grep nextcloud | awk '{print $3}')

# Archive data
tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud

Saves: ~7GB+ space

Option B: Keep and Restart

Document why you need it
Create restart procedure
Implement backup strategy
Monitor resource usage

My Recommendation: Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy.

12. Jellyfin (Stopped 2 weeks ago) ⚠️

Status: Exited (0) 2 weeks ago
Image: jellyfin/jellyfin (1.25GB)
GPU: RTX 4090 allocated but idle!

Media:

Movies: /mnt/user/movies
TV: /mnt/user/tv shows
Music: /mnt/user/music

Issue: $1600 GPU sitting idle!

Recommendations: If you want media server:

RESTART with hardware transcoding:
```
docker start Jellyfin
```
Configure NVENC/NVDEC for RTX 4090
Test 4K transcoding performance
Switch from host network to bridge (security)

If you don't need media server:

Remove GPU allocation from container
Free GPU for other projects (AI/ML)

Action Required: Decide on media server strategy

13. Large AI/ML Containers (Rarely Used)

ebook2audiobook - 20.06GB! (stopped 3 weeks)
docling-serve - 14.45GB! (stopped 2 weeks)

Total: 34.5GB for two containers!

Analysis:

Massive images
Rarely used (stopped weeks ago)
Experimental/one-time use?

Recommendations:

REMOVE both to free 34.5GB
If needed again, pull fresh images
Document use cases if keeping

Potential Savings: 34.5GB cache space!

14. Productivity Suite (Multiple Stopped)

baserow - Stopped 2 weeks (2.25GB)
NocoDB - Stopped 3 weeks (588MB)
OpenProject - Stopped 7 weeks (2.87GB)

Issue: Three project management tools - redundant!

Recommendations:

CHOOSE ONE (or none if not used)
Remove the others
Migrate data if needed first

Potential Savings: ~5GB

15. Development Tools

n8n (workflow automation) - Created but never started
steam-headless - Created but not running

Recommendations:

Document if you have plans for these
Remove if experimental and abandoned

📋 Container Decision Matrix

Container	Keep?	Action	Priority
open-webui	✅ Yes	Keep running, restart ollama	HIGH
NginxProxyManager	✅ Yes	Keep, document configs	CRITICAL
Gitea	✅ Yes	Keep, fix SSH port, backup	CRITICAL
ApacheGuacamole	⚠️ Decide	Fix MariaDB OR remove both	MEDIUM
Cloudflared	✅ Yes	Keep, rotate token	HIGH
Vaultwarden	✅ Yes	Keep, BACKUP NOW!	CRITICAL
ollama	✅ Yes	Restart immediately	HIGH
Monitoring Stack	✅ Yes	Restart all 3 containers	CRITICAL
MariaDB	⚠️ Conditional	If Guacamole stays	MEDIUM
Nextcloud AIO (7)	❌ Remove	Backup data, remove stack	LOW
Jellyfin	⚠️ Decide	Use GPU or remove	MEDIUM
ebook2audiobook	❌ Remove	Free 20GB	LOW
docling-serve	❌ Remove	Free 14.5GB	LOW
baserow/NocoDB/OpenProject	❌ Choose 1	Remove others	LOW
CloudBeaver/adminer	⚠️ Choose 1	Keep one DB admin	LOW

🎯 Recommended Action Plan

Phase 1: Critical (Do First!) 🚨

Backup Vaultwarden (30 min)

docker stop vaultwarden
tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden
docker start vaultwarden

Backup Gitea (30 min)

docker stop Gitea
tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea
docker start Gitea

Restart Monitoring Stack (15 min)

docker start Influxdb && sleep 15
docker start Telegraf Grafana
# Configure dashboards

Restart ollama (5 min)

docker start ollama
docker logs -f ollama

Phase 2: Cleanup (Free Space!) 💾

Remove Large Unused Containers (1 hour)
- ebook2audiobook (20GB)
- docling-serve (14.5GB)
- Nextcloud AIO stack (7GB)
- Saves: ~41GB!

Docker System Cleanup

docker system prune -a
# Free unused images and build cache

Phase 3: Decisions (This Week)

Guacamole + MariaDB - Keep or remove?
Jellyfin - Restart with GPU or remove?
Productivity tools - Choose one, remove others
Database admin - CloudBeaver or adminer?

📊 Storage Cleanup Impact

Current Cache Usage: 578GB / 932GB (63%)

After Recommended Cleanup:

Remove ebook2audiobook: -20GB
Remove docling-serve: -14.5GB
Remove Nextcloud AIO: -7GB
Docker system prune: ~10-20GB
Total Freed: ~50-60GB

New Cache Usage: ~520GB / 932GB (56%) ✅

🔐 Security Recommendations

Secrets Management - Stop using plain text env vars
Close Open Signups - Vaultwarden signups should be closed
SSH Port Conflict - Fix Gitea port 22 conflict
Network Mode - Move Jellyfin from host to bridge
Version Pinning - Stop using latest tags

📈 Resource Summary

Docker Images Total: ~50GB
Container Data: Varies by appdata
Cache Impact: High (63% full)

Top Resource Consumers (Images):

ebook2audiobook: 20.06GB
docling-serve: 14.45GB
Nextcloud stack: ~7GB
open-webui: 4.55GB
OpenProject: 2.87GB

🎓 Key Takeaways

6 services are your core - Keep these running
26 stopped containers - Cleanup opportunity
~40GB can be freed - Significant space available
No monitoring - Critical gap (restart Grafana stack!)
Backup critical - Vaultwarden and Gitea MUST be backed up

Last Updated: October 31, 2025
Next Review: After cleanup actions completed
Maintained By: Weston

15 KiB Raw Permalink Blame History

📦 Service Inventory - Complete Container Catalog

📊 Quick Stats

🟢 Running Services (6 containers)

1. open-webui ⭐⭐⭐

2. NginxProxyManager ⭐⭐⭐

3. Gitea ⭐⭐⭐

4. ApacheGuacamole ⭐⭐

5. Cloudflared ⭐⭐⭐

6. Vaultwarden ⭐⭐⭐

🔴 Recently Stopped Services (Worth Investigating)

7. ollama ⚠️

8. Monitoring Stack (Stopped 12 days ago) 🚨

9. MariaDB (Stopped 12 days ago) ⚠️

10. Database Admin Tools (Stopped 12 days ago)

🟡 Experimental / Inactive Services (Decision Needed)

11. Nextcloud AIO Stack (7 containers!) 🚨

12. Jellyfin (Stopped 2 weeks ago) ⚠️

13. Large AI/ML Containers (Rarely Used)

14. Productivity Suite (Multiple Stopped)

15. Development Tools

📋 Container Decision Matrix

🎯 Recommended Action Plan

Phase 1: Critical (Do First!) 🚨

Phase 2: Cleanup (Free Space!) 💾

Phase 3: Decisions (This Week)

📊 Storage Cleanup Impact

🔐 Security Recommendations

📈 Resource Summary

🎓 Key Takeaways

15 KiB

Raw Permalink Blame History