# Project Memory: dlx-ansible ## Infrastructure Overview - **NPM Server**: nginx (192.168.200.71) - Nginx Proxy Manager for SSL termination - **Application Servers**: hiveops (192.168.200.112), smartjournal (192.168.200.114) - **CI/CD Server**: jenkins (192.168.200.91) - Jenkins + SonarQube - All servers use `dlxadmin` user with passwordless sudo ## Critical Learnings ### SSL Certificate Offloading with Nginx Proxy Manager **Problem**: Spring Boot applications behind NPM experience redirect loops when accessed via HTTPS. **Root Cause**: Spring Boot doesn't trust `X-Forwarded-*` headers by default. When NPM terminates SSL and forwards HTTP to backend, Spring sees HTTP and redirects to HTTPS, creating infinite loop. **Solution**: Configure Spring Boot to trust forwarded headers: ```yaml environment: SERVER_FORWARD_HEADERS_STRATEGY: native SERVER_USE_FORWARD_HEADERS: true ``` **Key Points**: - Containers must be **recreated** (not restarted) for env vars to take effect - Verify with: `curl -I -H 'X-Forwarded-Proto: https' http://localhost:8080/` - Success indicator: `Strict-Transport-Security` header in response - Documentation: `docs/SSL-OFFLOADING-FIX.md` ### Docker Compose Best Practices **Environment Variable Loading**: - Use `--env-file` flag when .env is not in same directory as compose file - Example: `docker compose -f docker/docker-compose.yml --env-file .env up -d` **Container Updates**: - Restart: Keeps existing container, doesn't apply env changes - Recreate: Removes old container, creates new one with latest env/config - Always recreate when changing environment variables ### HiveOps Application Structure **Main Deployment** (`/opt/hiveops-deploy/`): - Full microservices stack - Services: incident-backend, incident-frontend, mgmt, remote - Managed via docker-compose **Standalone Deployment** (`/home/hiveops/`): - Simplified incident management system - Separate from main deployment - Used for direct hiveops.directlx.dev access ### Jenkins Firewall Blocking (2026-02-09) **Problem**: Jenkins and SonarQube were unreachable from network. **Root Cause**: Server had no host_vars file, inherited default firewall config (SSH only). **Solution**: Created `host_vars/jenkins.yml` with ports 22, 8080 (Jenkins), 9000 (SonarQube). **Quick Fix**: ```bash ansible jenkins -m community.general.ufw -a "rule=allow port=8080 proto=tcp" -b ansible jenkins -m community.general.ufw -a "rule=allow port=9000 proto=tcp" -b ansible jenkins -m shell -a "docker start postgresql sonarqube" -b ``` **Key Points**: - Jenkins runs as Java system service (not Docker) on port 8080 - SonarQube runs in Docker with PostgreSQL backend - Always create host_vars file for servers with specific firewall needs - Documentation: `docs/JENKINS-CONNECTIVITY-FIX.md` ## File Locations ### Host Variables - `/source/dlx-src/dlx-ansible/host_vars/npm.yml` - NPM firewall config - `/source/dlx-src/dlx-ansible/host_vars/hiveops.yml` - HiveOps settings - `/source/dlx-src/dlx-ansible/host_vars/smartjournal.yml` - SmartJournal settings - `/source/dlx-src/dlx-ansible/host_vars/jenkins.yml` - Jenkins/SonarQube firewall config ### Playbooks Created - `playbooks/fix-hiveops-ssl-offload.yml` - SSL offload fix automation - `playbooks/fix-hiveops-compose-indentation.yml` - Compose file corrections - `playbooks/fix-hiveops-mgmt-ssl.yml` - Management service SSL fix ### Templates - `templates/hiveops-docker-compose.prod.yml.j2` - Corrected compose template ## Storage Remediation (2026-02-08) **Critical Issues Identified**: 1. proxmox-00 root FS: 84.5% full (CRITICAL) 2. proxmox-01 dlx-docker: 81.1% full (HIGH) 3. Unused containers: 1.2 TB allocated 4. SonarQube: 354 GB (82% of allocation) **Remediation Playbooks Created**: - `remediate-storage-critical-issues.yml`: Log cleanup, Docker prune, audits - `remediate-docker-storage.yml`: Deep Docker cleanup + automation - `remediate-stopped-containers.yml`: Safe container removal with backups - `configure-storage-monitoring.yml`: Proactive monitoring (5/10 min checks) **Documentation**: - `STORAGE-AUDIT.md`: Full hardware/storage analysis (550 lines) - `STORAGE-REMEDIATION-GUIDE.md`: Step-by-step execution (480 lines) - `REMEDIATION-SUMMARY.md`: Quick reference (300 lines) **Expected Results**: - Total space freed: 1-2 TB - proxmox-00: 84.5% → 70% (10-15 GB freed) - proxmox-01: 81.1% → 70% (50-150 GB freed) - Automation prevents regrowth (weekly prune + hourly monitoring) **Commit**: 90ed5c1 ## Common Tasks ### Fix SSL Offloading for Spring Boot Service 1. Add env vars to .env: `SERVER_FORWARD_HEADERS_STRATEGY=native`, `SERVER_USE_FORWARD_HEADERS=true` 2. Add to docker-compose environment section 3. Recreate container: `docker stop && docker rm && docker compose up -d ` 4. Verify: Check for `Strict-Transport-Security` header ### Apply Firewall Configuration - Firewall is managed by common role (roles/common/tasks/security.yml) - Controlled per-host via `common_firewall_enabled` and `common_firewall_allowed_ports` - Some hosts (docker, hiveops, smartjournal) have firewall disabled for Docker networking ### Run Storage Remediation 1. Test with `--check`: `ansible-playbook playbooks/remediate-storage-critical-issues.yml --check` 2. Deploy monitoring: `ansible-playbook playbooks/configure-storage-monitoring.yml -l proxmox` 3. Fix proxmox-00: `ansible-playbook playbooks/remediate-storage-critical-issues.yml -l proxmox-00` 4. Fix proxmox-01: `ansible-playbook playbooks/remediate-docker-storage.yml -l proxmox-01` 5. Monitor: `tail -f /var/log/storage-monitor.log` 6. Remove containers (optional): `ansible-playbook playbooks/remediate-stopped-containers.yml -e dry_run=false` ## Security Notes - Only trust forwarded headers when backend is not internet-accessible - NPM server (192.168.200.71) should be only server that can reach backend ports - Backend ports should bind to localhost only: `127.0.0.1:8080:8080`