From 1d9896e6a1117e2ca6f1d6339979ec777e04b0e2 Mon Sep 17 00:00:00 2001 From: directlx Date: Mon, 9 Feb 2026 13:33:05 -0500 Subject: [PATCH] Initial Claude configuration and memory for dlx-ansible Add Claude Code configurations: - Memory file with infrastructure knowledge and critical learnings - Project-specific CLAUDE.md with commands and patterns - Security audit summary documentation - Repository structure and documentation Co-Authored-By: Claude Sonnet 4.5 --- README.md | 37 +++ .../security/SECURITY-AUDIT-SUMMARY.md | 230 ++++++++++++++++++ memory/dlx-ansible/MEMORY.md | 141 +++++++++++ project-configs/dlx-ansible/CLAUDE.md | 25 ++ 4 files changed, 433 insertions(+) create mode 100644 documentation/security/SECURITY-AUDIT-SUMMARY.md create mode 100644 memory/dlx-ansible/MEMORY.md create mode 100644 project-configs/dlx-ansible/CLAUDE.md diff --git a/README.md b/README.md index 73c0d78..643b3bb 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,39 @@ # dlx-claude +Claude Code configurations, memory, and documentation for DirectLX infrastructure projects. + +## Purpose + +This repository stores Claude-specific files that help Claude Code understand and work effectively with DirectLX infrastructure: + +- **Memory files**: Persistent knowledge about infrastructure, issues encountered, and solutions +- **Project configurations**: CLAUDE.md files with project-specific guidance +- **Documentation**: Security audits, troubleshooting guides, and best practices + +## Repository Structure + +``` +dlx-claude/ +├── memory/ +│ └── dlx-ansible/ # Claude's memory for Ansible project +│ └── MEMORY.md # Infrastructure knowledge, learnings, fixes +│ +├── project-configs/ +│ └── dlx-ansible/ # Project-specific Claude configuration +│ └── CLAUDE.md # Instructions for working with dlx-ansible +│ +├── documentation/ +│ └── security/ # Security-related documentation +│ └── SECURITY-AUDIT-SUMMARY.md +│ +└── README.md # This file +``` + +## Last Updated + +2026-02-09 + +--- + +**Repository**: http://192.168.200.102/directlx/dlx-claude +**Gitea Server**: 192.168.200.102 diff --git a/documentation/security/SECURITY-AUDIT-SUMMARY.md b/documentation/security/SECURITY-AUDIT-SUMMARY.md new file mode 100644 index 0000000..5b211eb --- /dev/null +++ b/documentation/security/SECURITY-AUDIT-SUMMARY.md @@ -0,0 +1,230 @@ +# Security Audit Summary + +**Date**: 2026-02-09 +**Servers Audited**: 16 +**Full Report**: `/tmp/security-audit-full-report.txt` + +## Executive Summary + +Security audit completed across all infrastructure servers. Multiple security concerns identified ranging from **CRITICAL** to **LOW** priority. + +## Critical Security Findings + +### 🔴 CRITICAL + +1. **Root Login Enabled via SSH** (`ansible-node`, `gitea`) + - **Risk**: Direct root access increases attack surface + - **Affected**: 2 servers + - **Recommendation**: Disable root login immediately + ```yaml + PermitRootLogin no + ``` + +2. **No Firewall on Multiple Servers** + - **Risk**: All ports exposed to network + - **Affected**: `ansible-node`, `gitea`, and others + - **Recommendation**: Enable UFW with strict rules + +3. **Password Authentication Enabled on Jenkins** + - **Risk**: We enabled this for temporary AWS access + - **Status**: Known configuration (for AWS Jenkins Master) + - **Recommendation**: Switch to key-based auth when possible + +### 🟠 HIGH + +4. **Automatic Updates Not Configured** + - **Risk**: Servers missing security patches + - **Affected**: `ansible-node`, `docker`, and most servers + - **Recommendation**: Enable unattended-upgrades + +5. **Security Updates Available** + - **Critical**: `docker` has **65 pending security updates** + - **Recommendation**: Apply immediately + ```bash + ansible docker -m apt -a "upgrade=dist update_cache=yes" -b + ``` + +6. **Multiple Services Exposed on Docker Server** + - **Risk**: Ports 5000, 8000-8082, 8443, 9000, 11434 publicly accessible + - **Firewall**: Currently disabled + - **Recommendation**: Enable firewall, restrict to internal network + +### 🟡 MEDIUM + +7. **Password-Based Users on Multiple Servers** + - **Users with passwords**: root, dlxadmin, directlx, jenkins + - **Risk**: Potential brute-force targets + - **Recommendation**: Enforce strong password policies + +8. **PermitRootLogin Enabled** + - **Affected**: Several Proxmox nodes + - **Risk**: Root SSH access possible + - **Recommendation**: Disable after confirming Proxmox compatibility + +## Server-Specific Findings + +### ansible-node (192.168.200.106) +- ✅ Password auth: Disabled +- ❌ Root login: **ENABLED** +- ❌ Firewall: **NOT CONFIGURED** +- ❌ Auto-updates: **NOT CONFIGURED** +- Services: nginx (80, 443), MySQL (3306), Webmin (12321) + +### docker (192.168.200.200) +- ✅ Root login: Disabled +- ❌ Firewall: **INACTIVE** +- ❌ Auto-updates: **NOT CONFIGURED** +- ⚠️ Security updates: **65 PENDING** +- Services: Many Docker containers on multiple ports + +### jenkins (192.168.200.91) +- ✅ Firewall: Active (ports 22, 8080, 9000, 2222) +- ⚠️ Password auth: **ENABLED** (intentional for AWS) +- ⚠️ Keyboard-interactive: **ENABLED** (intentional) +- Services: Jenkins (8080), SonarQube (9000) + +### npm (192.168.200.71) +- ✅ Firewall: Active (ports 22, 80, 443, 81, 2222) +- ✅ Password auth: Disabled +- Services: Nginx Proxy Manager, OpenResty + +### hiveops, smartjournal, odoo +- ⚠️ Firewall: **DISABLED** (intentional for Docker networking) +- ❌ Auto-updates: **NOT CONFIGURED** +- Multiple Docker services running + +### Proxmox Nodes (proxmox-00, 01, 02) +- ✅ Firewall: Active +- ⚠️ Root login: Enabled (may be required for Proxmox) +- Services: Proxmox web interface + +## Immediate Actions Required + +### Priority 1 (Critical - Do Now) + +1. **Disable Root SSH Login** + ```bash + ansible all -m lineinfile -a "path=/etc/ssh/sshd_config regexp='^PermitRootLogin' line='PermitRootLogin no'" -b + ansible all -m service -a "name=sshd state=restarted" -b + ``` + +2. **Apply Security Updates on Docker Server** + ```bash + ansible docker -m apt -a "upgrade=dist update_cache=yes" -b + ``` + +3. **Enable Firewall on Critical Servers** + ```bash + # For servers without firewall + ansible ansible-node,gitea -m apt -a "name=ufw state=present" -b + ansible ansible-node,gitea -m ufw -a "rule=allow port=22 proto=tcp" -b + ansible ansible-node,gitea -m ufw -a "state=enabled" -b + ``` + +### Priority 2 (High - This Week) + +4. **Enable Automatic Security Updates** + ```bash + ansible all -m apt -a "name=unattended-upgrades state=present" -b + ansible all -m copy -a "dest=/etc/apt/apt.conf.d/20auto-upgrades content='APT::Periodic::Update-Package-Lists \"1\";\nAPT::Periodic::Unattended-Upgrade \"1\";' mode=0644" -b + ``` + +5. **Configure Firewall for Docker Server** + ```bash + ansible docker -m ufw -a "rule=allow port={{ item }} proto=tcp" -b + # Add specific ports needed for services + ``` + +6. **Review and Secure Open Ports** + - Audit what services need external access + - Close unnecessary ports + - Use NPM proxy for web services + +### Priority 3 (Medium - This Month) + +7. **Implement Password Policy** + ```yaml + # In /etc/login.defs + PASS_MAX_DAYS 90 + PASS_MIN_DAYS 1 + PASS_MIN_LEN 12 + PASS_WARN_AGE 7 + ``` + +8. **Enable Fail2Ban** + ```bash + ansible all -m apt -a "name=fail2ban state=present" -b + ``` + +9. **Regular Security Audit Schedule** + - Run monthly: `ansible-playbook playbooks/security-audit-v2.yml` + - Review findings + - Track improvements + +## Positive Security Practices Found + +✅ **Jenkins Server**: Well-configured firewall with specific ports +✅ **NPM Server**: Good firewall configuration, SSL certificates managed +✅ **Most Servers**: Password SSH auth disabled (key-only) +✅ **Most Servers**: Root login restricted +✅ **Proxmox Nodes**: Firewalls active + +## Recommended Playbooks + +### security-hardening.yml (To Be Created) +```yaml +- Enable automatic security updates +- Disable root SSH login (except where needed) +- Configure UFW on all servers +- Install fail2ban +- Set password policies +- Remove world-writable files +``` + +### security-monitoring.yml (To Be Created) +```yaml +- Monitor failed login attempts +- Alert on unauthorized access +- Track open ports +- Monitor security updates +``` + +## Compliance Checklist + +- [ ] All servers have firewall enabled +- [ ] Root SSH login disabled (except Proxmox) +- [ ] Password authentication disabled (except where needed) +- [ ] Automatic updates enabled +- [ ] No pending critical security updates +- [ ] Strong password policies enforced +- [ ] Fail2Ban installed and configured +- [ ] Regular security audits scheduled +- [ ] SSH keys rotated (90 days) +- [ ] Unnecessary services disabled + +## Next Steps + +1. **Review this report** with stakeholders +2. **Execute Priority 1 actions** immediately +3. **Schedule Priority 2 actions** for this week +4. **Create remediation playbooks** for automation +5. **Establish monthly security audit** routine +6. **Document exceptions** (e.g., Jenkins password auth for AWS) + +## Resources + +- Full audit report: `/tmp/security-audit-full-report.txt` +- Individual reports: `/tmp/security-audit-*/report.txt` +- Audit playbook: `playbooks/security-audit-v2.yml` + +## Notes + +- Jenkins password auth is intentional for AWS Jenkins Master connection +- Firewall disabled on hiveops/smartjournal/odoo due to Docker networking requirements +- Proxmox root login may be required for management interface + +--- + +**Generated**: 2026-02-09 +**Auditor**: Ansible Security Audit v2 +**Next Audit**: 2026-03-09 (monthly) diff --git a/memory/dlx-ansible/MEMORY.md b/memory/dlx-ansible/MEMORY.md new file mode 100644 index 0000000..87f69b5 --- /dev/null +++ b/memory/dlx-ansible/MEMORY.md @@ -0,0 +1,141 @@ +# Project Memory: dlx-ansible + +## Infrastructure Overview +- **NPM Server**: nginx (192.168.200.71) - Nginx Proxy Manager for SSL termination +- **Application Servers**: hiveops (192.168.200.112), smartjournal (192.168.200.114) +- **CI/CD Server**: jenkins (192.168.200.91) - Jenkins + SonarQube +- All servers use `dlxadmin` user with passwordless sudo + +## Critical Learnings + +### SSL Certificate Offloading with Nginx Proxy Manager + +**Problem**: Spring Boot applications behind NPM experience redirect loops when accessed via HTTPS. + +**Root Cause**: Spring Boot doesn't trust `X-Forwarded-*` headers by default. When NPM terminates SSL and forwards HTTP to backend, Spring sees HTTP and redirects to HTTPS, creating infinite loop. + +**Solution**: Configure Spring Boot to trust forwarded headers: +```yaml +environment: + SERVER_FORWARD_HEADERS_STRATEGY: native + SERVER_USE_FORWARD_HEADERS: true +``` + +**Key Points**: +- Containers must be **recreated** (not restarted) for env vars to take effect +- Verify with: `curl -I -H 'X-Forwarded-Proto: https' http://localhost:8080/` +- Success indicator: `Strict-Transport-Security` header in response +- Documentation: `docs/SSL-OFFLOADING-FIX.md` + +### Docker Compose Best Practices + +**Environment Variable Loading**: +- Use `--env-file` flag when .env is not in same directory as compose file +- Example: `docker compose -f docker/docker-compose.yml --env-file .env up -d` + +**Container Updates**: +- Restart: Keeps existing container, doesn't apply env changes +- Recreate: Removes old container, creates new one with latest env/config +- Always recreate when changing environment variables + +### HiveOps Application Structure + +**Main Deployment** (`/opt/hiveops-deploy/`): +- Full microservices stack +- Services: incident-backend, incident-frontend, mgmt, remote +- Managed via docker-compose + +**Standalone Deployment** (`/home/hiveops/`): +- Simplified incident management system +- Separate from main deployment +- Used for direct hiveops.directlx.dev access + +### Jenkins Firewall Blocking (2026-02-09) + +**Problem**: Jenkins and SonarQube were unreachable from network. + +**Root Cause**: Server had no host_vars file, inherited default firewall config (SSH only). + +**Solution**: Created `host_vars/jenkins.yml` with ports 22, 8080 (Jenkins), 9000 (SonarQube). + +**Quick Fix**: +```bash +ansible jenkins -m community.general.ufw -a "rule=allow port=8080 proto=tcp" -b +ansible jenkins -m community.general.ufw -a "rule=allow port=9000 proto=tcp" -b +ansible jenkins -m shell -a "docker start postgresql sonarqube" -b +``` + +**Key Points**: +- Jenkins runs as Java system service (not Docker) on port 8080 +- SonarQube runs in Docker with PostgreSQL backend +- Always create host_vars file for servers with specific firewall needs +- Documentation: `docs/JENKINS-CONNECTIVITY-FIX.md` + +## File Locations + +### Host Variables +- `/source/dlx-src/dlx-ansible/host_vars/npm.yml` - NPM firewall config +- `/source/dlx-src/dlx-ansible/host_vars/hiveops.yml` - HiveOps settings +- `/source/dlx-src/dlx-ansible/host_vars/smartjournal.yml` - SmartJournal settings +- `/source/dlx-src/dlx-ansible/host_vars/jenkins.yml` - Jenkins/SonarQube firewall config + +### Playbooks Created +- `playbooks/fix-hiveops-ssl-offload.yml` - SSL offload fix automation +- `playbooks/fix-hiveops-compose-indentation.yml` - Compose file corrections +- `playbooks/fix-hiveops-mgmt-ssl.yml` - Management service SSL fix + +### Templates +- `templates/hiveops-docker-compose.prod.yml.j2` - Corrected compose template + +## Storage Remediation (2026-02-08) + +**Critical Issues Identified**: +1. proxmox-00 root FS: 84.5% full (CRITICAL) +2. proxmox-01 dlx-docker: 81.1% full (HIGH) +3. Unused containers: 1.2 TB allocated +4. SonarQube: 354 GB (82% of allocation) + +**Remediation Playbooks Created**: +- `remediate-storage-critical-issues.yml`: Log cleanup, Docker prune, audits +- `remediate-docker-storage.yml`: Deep Docker cleanup + automation +- `remediate-stopped-containers.yml`: Safe container removal with backups +- `configure-storage-monitoring.yml`: Proactive monitoring (5/10 min checks) + +**Documentation**: +- `STORAGE-AUDIT.md`: Full hardware/storage analysis (550 lines) +- `STORAGE-REMEDIATION-GUIDE.md`: Step-by-step execution (480 lines) +- `REMEDIATION-SUMMARY.md`: Quick reference (300 lines) + +**Expected Results**: +- Total space freed: 1-2 TB +- proxmox-00: 84.5% → 70% (10-15 GB freed) +- proxmox-01: 81.1% → 70% (50-150 GB freed) +- Automation prevents regrowth (weekly prune + hourly monitoring) + +**Commit**: 90ed5c1 + +## Common Tasks + +### Fix SSL Offloading for Spring Boot Service +1. Add env vars to .env: `SERVER_FORWARD_HEADERS_STRATEGY=native`, `SERVER_USE_FORWARD_HEADERS=true` +2. Add to docker-compose environment section +3. Recreate container: `docker stop && docker rm && docker compose up -d ` +4. Verify: Check for `Strict-Transport-Security` header + +### Apply Firewall Configuration +- Firewall is managed by common role (roles/common/tasks/security.yml) +- Controlled per-host via `common_firewall_enabled` and `common_firewall_allowed_ports` +- Some hosts (docker, hiveops, smartjournal) have firewall disabled for Docker networking + +### Run Storage Remediation +1. Test with `--check`: `ansible-playbook playbooks/remediate-storage-critical-issues.yml --check` +2. Deploy monitoring: `ansible-playbook playbooks/configure-storage-monitoring.yml -l proxmox` +3. Fix proxmox-00: `ansible-playbook playbooks/remediate-storage-critical-issues.yml -l proxmox-00` +4. Fix proxmox-01: `ansible-playbook playbooks/remediate-docker-storage.yml -l proxmox-01` +5. Monitor: `tail -f /var/log/storage-monitor.log` +6. Remove containers (optional): `ansible-playbook playbooks/remediate-stopped-containers.yml -e dry_run=false` + +## Security Notes +- Only trust forwarded headers when backend is not internet-accessible +- NPM server (192.168.200.71) should be only server that can reach backend ports +- Backend ports should bind to localhost only: `127.0.0.1:8080:8080` diff --git a/project-configs/dlx-ansible/CLAUDE.md b/project-configs/dlx-ansible/CLAUDE.md new file mode 100644 index 0000000..51442f8 --- /dev/null +++ b/project-configs/dlx-ansible/CLAUDE.md @@ -0,0 +1,25 @@ +# CLAUDE.md - dlx-ansible Project + +Infrastructure as Code for DirectLX - Ansible playbooks for managing Proxmox-based homelab. + +## Infrastructure + +16 servers: 3x Proxmox, 3x databases, Jenkins, Gitea, NPM, Docker host, Pi-hole, applications + +## Key Commands + +```bash +# Run playbooks +ansible-playbook playbooks/site.yml +ansible-playbook playbooks/security-audit-v2.yml + +# Ad-hoc +ansible all -m ping +ansible all -m shell -a "uptime" -b +``` + +## Critical Knowledge + +See `memory/dlx-ansible/MEMORY.md` for infrastructure details, fixes, and learnings. + +**Last Updated**: 2026-02-09