142 lines
5.8 KiB
Markdown
142 lines
5.8 KiB
Markdown
# Project Memory: dlx-ansible
|
|
|
|
## Infrastructure Overview
|
|
- **NPM Server**: nginx (192.168.200.71) - Nginx Proxy Manager for SSL termination
|
|
- **Application Servers**: hiveops (192.168.200.112), smartjournal (192.168.200.114)
|
|
- **CI/CD Server**: jenkins (192.168.200.91) - Jenkins + SonarQube
|
|
- All servers use `dlxadmin` user with passwordless sudo
|
|
|
|
## Critical Learnings
|
|
|
|
### SSL Certificate Offloading with Nginx Proxy Manager
|
|
|
|
**Problem**: Spring Boot applications behind NPM experience redirect loops when accessed via HTTPS.
|
|
|
|
**Root Cause**: Spring Boot doesn't trust `X-Forwarded-*` headers by default. When NPM terminates SSL and forwards HTTP to backend, Spring sees HTTP and redirects to HTTPS, creating infinite loop.
|
|
|
|
**Solution**: Configure Spring Boot to trust forwarded headers:
|
|
```yaml
|
|
environment:
|
|
SERVER_FORWARD_HEADERS_STRATEGY: native
|
|
SERVER_USE_FORWARD_HEADERS: true
|
|
```
|
|
|
|
**Key Points**:
|
|
- Containers must be **recreated** (not restarted) for env vars to take effect
|
|
- Verify with: `curl -I -H 'X-Forwarded-Proto: https' http://localhost:8080/`
|
|
- Success indicator: `Strict-Transport-Security` header in response
|
|
- Documentation: `docs/SSL-OFFLOADING-FIX.md`
|
|
|
|
### Docker Compose Best Practices
|
|
|
|
**Environment Variable Loading**:
|
|
- Use `--env-file` flag when .env is not in same directory as compose file
|
|
- Example: `docker compose -f docker/docker-compose.yml --env-file .env up -d`
|
|
|
|
**Container Updates**:
|
|
- Restart: Keeps existing container, doesn't apply env changes
|
|
- Recreate: Removes old container, creates new one with latest env/config
|
|
- Always recreate when changing environment variables
|
|
|
|
### HiveOps Application Structure
|
|
|
|
**Main Deployment** (`/opt/hiveops-deploy/`):
|
|
- Full microservices stack
|
|
- Services: incident-backend, incident-frontend, mgmt, remote
|
|
- Managed via docker-compose
|
|
|
|
**Standalone Deployment** (`/home/hiveops/`):
|
|
- Simplified incident management system
|
|
- Separate from main deployment
|
|
- Used for direct hiveops.directlx.dev access
|
|
|
|
### Jenkins Firewall Blocking (2026-02-09)
|
|
|
|
**Problem**: Jenkins and SonarQube were unreachable from network.
|
|
|
|
**Root Cause**: Server had no host_vars file, inherited default firewall config (SSH only).
|
|
|
|
**Solution**: Created `host_vars/jenkins.yml` with ports 22, 8080 (Jenkins), 9000 (SonarQube).
|
|
|
|
**Quick Fix**:
|
|
```bash
|
|
ansible jenkins -m community.general.ufw -a "rule=allow port=8080 proto=tcp" -b
|
|
ansible jenkins -m community.general.ufw -a "rule=allow port=9000 proto=tcp" -b
|
|
ansible jenkins -m shell -a "docker start postgresql sonarqube" -b
|
|
```
|
|
|
|
**Key Points**:
|
|
- Jenkins runs as Java system service (not Docker) on port 8080
|
|
- SonarQube runs in Docker with PostgreSQL backend
|
|
- Always create host_vars file for servers with specific firewall needs
|
|
- Documentation: `docs/JENKINS-CONNECTIVITY-FIX.md`
|
|
|
|
## File Locations
|
|
|
|
### Host Variables
|
|
- `/source/dlx-src/dlx-ansible/host_vars/npm.yml` - NPM firewall config
|
|
- `/source/dlx-src/dlx-ansible/host_vars/hiveops.yml` - HiveOps settings
|
|
- `/source/dlx-src/dlx-ansible/host_vars/smartjournal.yml` - SmartJournal settings
|
|
- `/source/dlx-src/dlx-ansible/host_vars/jenkins.yml` - Jenkins/SonarQube firewall config
|
|
|
|
### Playbooks Created
|
|
- `playbooks/fix-hiveops-ssl-offload.yml` - SSL offload fix automation
|
|
- `playbooks/fix-hiveops-compose-indentation.yml` - Compose file corrections
|
|
- `playbooks/fix-hiveops-mgmt-ssl.yml` - Management service SSL fix
|
|
|
|
### Templates
|
|
- `templates/hiveops-docker-compose.prod.yml.j2` - Corrected compose template
|
|
|
|
## Storage Remediation (2026-02-08)
|
|
|
|
**Critical Issues Identified**:
|
|
1. proxmox-00 root FS: 84.5% full (CRITICAL)
|
|
2. proxmox-01 dlx-docker: 81.1% full (HIGH)
|
|
3. Unused containers: 1.2 TB allocated
|
|
4. SonarQube: 354 GB (82% of allocation)
|
|
|
|
**Remediation Playbooks Created**:
|
|
- `remediate-storage-critical-issues.yml`: Log cleanup, Docker prune, audits
|
|
- `remediate-docker-storage.yml`: Deep Docker cleanup + automation
|
|
- `remediate-stopped-containers.yml`: Safe container removal with backups
|
|
- `configure-storage-monitoring.yml`: Proactive monitoring (5/10 min checks)
|
|
|
|
**Documentation**:
|
|
- `STORAGE-AUDIT.md`: Full hardware/storage analysis (550 lines)
|
|
- `STORAGE-REMEDIATION-GUIDE.md`: Step-by-step execution (480 lines)
|
|
- `REMEDIATION-SUMMARY.md`: Quick reference (300 lines)
|
|
|
|
**Expected Results**:
|
|
- Total space freed: 1-2 TB
|
|
- proxmox-00: 84.5% → 70% (10-15 GB freed)
|
|
- proxmox-01: 81.1% → 70% (50-150 GB freed)
|
|
- Automation prevents regrowth (weekly prune + hourly monitoring)
|
|
|
|
**Commit**: 90ed5c1
|
|
|
|
## Common Tasks
|
|
|
|
### Fix SSL Offloading for Spring Boot Service
|
|
1. Add env vars to .env: `SERVER_FORWARD_HEADERS_STRATEGY=native`, `SERVER_USE_FORWARD_HEADERS=true`
|
|
2. Add to docker-compose environment section
|
|
3. Recreate container: `docker stop <name> && docker rm <name> && docker compose up -d <service>`
|
|
4. Verify: Check for `Strict-Transport-Security` header
|
|
|
|
### Apply Firewall Configuration
|
|
- Firewall is managed by common role (roles/common/tasks/security.yml)
|
|
- Controlled per-host via `common_firewall_enabled` and `common_firewall_allowed_ports`
|
|
- Some hosts (docker, hiveops, smartjournal) have firewall disabled for Docker networking
|
|
|
|
### Run Storage Remediation
|
|
1. Test with `--check`: `ansible-playbook playbooks/remediate-storage-critical-issues.yml --check`
|
|
2. Deploy monitoring: `ansible-playbook playbooks/configure-storage-monitoring.yml -l proxmox`
|
|
3. Fix proxmox-00: `ansible-playbook playbooks/remediate-storage-critical-issues.yml -l proxmox-00`
|
|
4. Fix proxmox-01: `ansible-playbook playbooks/remediate-docker-storage.yml -l proxmox-01`
|
|
5. Monitor: `tail -f /var/log/storage-monitor.log`
|
|
6. Remove containers (optional): `ansible-playbook playbooks/remediate-stopped-containers.yml -e dry_run=false`
|
|
|
|
## Security Notes
|
|
- Only trust forwarded headers when backend is not internet-accessible
|
|
- NPM server (192.168.200.71) should be only server that can reach backend ports
|
|
- Backend ports should bind to localhost only: `127.0.0.1:8080:8080`
|