Initial Claude configuration and memory for dlx-ansible

Add Claude Code configurations:
- Memory file with infrastructure knowledge and critical learnings
- Project-specific CLAUDE.md with commands and patterns
- Security audit summary documentation
- Repository structure and documentation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
directlx 2026-02-09 13:33:05 -05:00
parent d23b3e4662
commit 1d9896e6a1
4 changed files with 433 additions and 0 deletions

View File

@ -1,2 +1,39 @@
# dlx-claude # dlx-claude
Claude Code configurations, memory, and documentation for DirectLX infrastructure projects.
## Purpose
This repository stores Claude-specific files that help Claude Code understand and work effectively with DirectLX infrastructure:
- **Memory files**: Persistent knowledge about infrastructure, issues encountered, and solutions
- **Project configurations**: CLAUDE.md files with project-specific guidance
- **Documentation**: Security audits, troubleshooting guides, and best practices
## Repository Structure
```
dlx-claude/
├── memory/
│ └── dlx-ansible/ # Claude's memory for Ansible project
│ └── MEMORY.md # Infrastructure knowledge, learnings, fixes
├── project-configs/
│ └── dlx-ansible/ # Project-specific Claude configuration
│ └── CLAUDE.md # Instructions for working with dlx-ansible
├── documentation/
│ └── security/ # Security-related documentation
│ └── SECURITY-AUDIT-SUMMARY.md
└── README.md # This file
```
## Last Updated
2026-02-09
---
**Repository**: http://192.168.200.102/directlx/dlx-claude
**Gitea Server**: 192.168.200.102

View File

@ -0,0 +1,230 @@
# Security Audit Summary
**Date**: 2026-02-09
**Servers Audited**: 16
**Full Report**: `/tmp/security-audit-full-report.txt`
## Executive Summary
Security audit completed across all infrastructure servers. Multiple security concerns identified ranging from **CRITICAL** to **LOW** priority.
## Critical Security Findings
### 🔴 CRITICAL
1. **Root Login Enabled via SSH** (`ansible-node`, `gitea`)
- **Risk**: Direct root access increases attack surface
- **Affected**: 2 servers
- **Recommendation**: Disable root login immediately
```yaml
PermitRootLogin no
```
2. **No Firewall on Multiple Servers**
- **Risk**: All ports exposed to network
- **Affected**: `ansible-node`, `gitea`, and others
- **Recommendation**: Enable UFW with strict rules
3. **Password Authentication Enabled on Jenkins**
- **Risk**: We enabled this for temporary AWS access
- **Status**: Known configuration (for AWS Jenkins Master)
- **Recommendation**: Switch to key-based auth when possible
### 🟠 HIGH
4. **Automatic Updates Not Configured**
- **Risk**: Servers missing security patches
- **Affected**: `ansible-node`, `docker`, and most servers
- **Recommendation**: Enable unattended-upgrades
5. **Security Updates Available**
- **Critical**: `docker` has **65 pending security updates**
- **Recommendation**: Apply immediately
```bash
ansible docker -m apt -a "upgrade=dist update_cache=yes" -b
```
6. **Multiple Services Exposed on Docker Server**
- **Risk**: Ports 5000, 8000-8082, 8443, 9000, 11434 publicly accessible
- **Firewall**: Currently disabled
- **Recommendation**: Enable firewall, restrict to internal network
### 🟡 MEDIUM
7. **Password-Based Users on Multiple Servers**
- **Users with passwords**: root, dlxadmin, directlx, jenkins
- **Risk**: Potential brute-force targets
- **Recommendation**: Enforce strong password policies
8. **PermitRootLogin Enabled**
- **Affected**: Several Proxmox nodes
- **Risk**: Root SSH access possible
- **Recommendation**: Disable after confirming Proxmox compatibility
## Server-Specific Findings
### ansible-node (192.168.200.106)
- ✅ Password auth: Disabled
- ❌ Root login: **ENABLED**
- ❌ Firewall: **NOT CONFIGURED**
- ❌ Auto-updates: **NOT CONFIGURED**
- Services: nginx (80, 443), MySQL (3306), Webmin (12321)
### docker (192.168.200.200)
- ✅ Root login: Disabled
- ❌ Firewall: **INACTIVE**
- ❌ Auto-updates: **NOT CONFIGURED**
- ⚠️ Security updates: **65 PENDING**
- Services: Many Docker containers on multiple ports
### jenkins (192.168.200.91)
- ✅ Firewall: Active (ports 22, 8080, 9000, 2222)
- ⚠️ Password auth: **ENABLED** (intentional for AWS)
- ⚠️ Keyboard-interactive: **ENABLED** (intentional)
- Services: Jenkins (8080), SonarQube (9000)
### npm (192.168.200.71)
- ✅ Firewall: Active (ports 22, 80, 443, 81, 2222)
- ✅ Password auth: Disabled
- Services: Nginx Proxy Manager, OpenResty
### hiveops, smartjournal, odoo
- ⚠️ Firewall: **DISABLED** (intentional for Docker networking)
- ❌ Auto-updates: **NOT CONFIGURED**
- Multiple Docker services running
### Proxmox Nodes (proxmox-00, 01, 02)
- ✅ Firewall: Active
- ⚠️ Root login: Enabled (may be required for Proxmox)
- Services: Proxmox web interface
## Immediate Actions Required
### Priority 1 (Critical - Do Now)
1. **Disable Root SSH Login**
```bash
ansible all -m lineinfile -a "path=/etc/ssh/sshd_config regexp='^PermitRootLogin' line='PermitRootLogin no'" -b
ansible all -m service -a "name=sshd state=restarted" -b
```
2. **Apply Security Updates on Docker Server**
```bash
ansible docker -m apt -a "upgrade=dist update_cache=yes" -b
```
3. **Enable Firewall on Critical Servers**
```bash
# For servers without firewall
ansible ansible-node,gitea -m apt -a "name=ufw state=present" -b
ansible ansible-node,gitea -m ufw -a "rule=allow port=22 proto=tcp" -b
ansible ansible-node,gitea -m ufw -a "state=enabled" -b
```
### Priority 2 (High - This Week)
4. **Enable Automatic Security Updates**
```bash
ansible all -m apt -a "name=unattended-upgrades state=present" -b
ansible all -m copy -a "dest=/etc/apt/apt.conf.d/20auto-upgrades content='APT::Periodic::Update-Package-Lists \"1\";\nAPT::Periodic::Unattended-Upgrade \"1\";' mode=0644" -b
```
5. **Configure Firewall for Docker Server**
```bash
ansible docker -m ufw -a "rule=allow port={{ item }} proto=tcp" -b
# Add specific ports needed for services
```
6. **Review and Secure Open Ports**
- Audit what services need external access
- Close unnecessary ports
- Use NPM proxy for web services
### Priority 3 (Medium - This Month)
7. **Implement Password Policy**
```yaml
# In /etc/login.defs
PASS_MAX_DAYS 90
PASS_MIN_DAYS 1
PASS_MIN_LEN 12
PASS_WARN_AGE 7
```
8. **Enable Fail2Ban**
```bash
ansible all -m apt -a "name=fail2ban state=present" -b
```
9. **Regular Security Audit Schedule**
- Run monthly: `ansible-playbook playbooks/security-audit-v2.yml`
- Review findings
- Track improvements
## Positive Security Practices Found
**Jenkins Server**: Well-configured firewall with specific ports
**NPM Server**: Good firewall configuration, SSL certificates managed
**Most Servers**: Password SSH auth disabled (key-only)
**Most Servers**: Root login restricted
**Proxmox Nodes**: Firewalls active
## Recommended Playbooks
### security-hardening.yml (To Be Created)
```yaml
- Enable automatic security updates
- Disable root SSH login (except where needed)
- Configure UFW on all servers
- Install fail2ban
- Set password policies
- Remove world-writable files
```
### security-monitoring.yml (To Be Created)
```yaml
- Monitor failed login attempts
- Alert on unauthorized access
- Track open ports
- Monitor security updates
```
## Compliance Checklist
- [ ] All servers have firewall enabled
- [ ] Root SSH login disabled (except Proxmox)
- [ ] Password authentication disabled (except where needed)
- [ ] Automatic updates enabled
- [ ] No pending critical security updates
- [ ] Strong password policies enforced
- [ ] Fail2Ban installed and configured
- [ ] Regular security audits scheduled
- [ ] SSH keys rotated (90 days)
- [ ] Unnecessary services disabled
## Next Steps
1. **Review this report** with stakeholders
2. **Execute Priority 1 actions** immediately
3. **Schedule Priority 2 actions** for this week
4. **Create remediation playbooks** for automation
5. **Establish monthly security audit** routine
6. **Document exceptions** (e.g., Jenkins password auth for AWS)
## Resources
- Full audit report: `/tmp/security-audit-full-report.txt`
- Individual reports: `/tmp/security-audit-*/report.txt`
- Audit playbook: `playbooks/security-audit-v2.yml`
## Notes
- Jenkins password auth is intentional for AWS Jenkins Master connection
- Firewall disabled on hiveops/smartjournal/odoo due to Docker networking requirements
- Proxmox root login may be required for management interface
---
**Generated**: 2026-02-09
**Auditor**: Ansible Security Audit v2
**Next Audit**: 2026-03-09 (monthly)

View File

@ -0,0 +1,141 @@
# Project Memory: dlx-ansible
## Infrastructure Overview
- **NPM Server**: nginx (192.168.200.71) - Nginx Proxy Manager for SSL termination
- **Application Servers**: hiveops (192.168.200.112), smartjournal (192.168.200.114)
- **CI/CD Server**: jenkins (192.168.200.91) - Jenkins + SonarQube
- All servers use `dlxadmin` user with passwordless sudo
## Critical Learnings
### SSL Certificate Offloading with Nginx Proxy Manager
**Problem**: Spring Boot applications behind NPM experience redirect loops when accessed via HTTPS.
**Root Cause**: Spring Boot doesn't trust `X-Forwarded-*` headers by default. When NPM terminates SSL and forwards HTTP to backend, Spring sees HTTP and redirects to HTTPS, creating infinite loop.
**Solution**: Configure Spring Boot to trust forwarded headers:
```yaml
environment:
SERVER_FORWARD_HEADERS_STRATEGY: native
SERVER_USE_FORWARD_HEADERS: true
```
**Key Points**:
- Containers must be **recreated** (not restarted) for env vars to take effect
- Verify with: `curl -I -H 'X-Forwarded-Proto: https' http://localhost:8080/`
- Success indicator: `Strict-Transport-Security` header in response
- Documentation: `docs/SSL-OFFLOADING-FIX.md`
### Docker Compose Best Practices
**Environment Variable Loading**:
- Use `--env-file` flag when .env is not in same directory as compose file
- Example: `docker compose -f docker/docker-compose.yml --env-file .env up -d`
**Container Updates**:
- Restart: Keeps existing container, doesn't apply env changes
- Recreate: Removes old container, creates new one with latest env/config
- Always recreate when changing environment variables
### HiveOps Application Structure
**Main Deployment** (`/opt/hiveops-deploy/`):
- Full microservices stack
- Services: incident-backend, incident-frontend, mgmt, remote
- Managed via docker-compose
**Standalone Deployment** (`/home/hiveops/`):
- Simplified incident management system
- Separate from main deployment
- Used for direct hiveops.directlx.dev access
### Jenkins Firewall Blocking (2026-02-09)
**Problem**: Jenkins and SonarQube were unreachable from network.
**Root Cause**: Server had no host_vars file, inherited default firewall config (SSH only).
**Solution**: Created `host_vars/jenkins.yml` with ports 22, 8080 (Jenkins), 9000 (SonarQube).
**Quick Fix**:
```bash
ansible jenkins -m community.general.ufw -a "rule=allow port=8080 proto=tcp" -b
ansible jenkins -m community.general.ufw -a "rule=allow port=9000 proto=tcp" -b
ansible jenkins -m shell -a "docker start postgresql sonarqube" -b
```
**Key Points**:
- Jenkins runs as Java system service (not Docker) on port 8080
- SonarQube runs in Docker with PostgreSQL backend
- Always create host_vars file for servers with specific firewall needs
- Documentation: `docs/JENKINS-CONNECTIVITY-FIX.md`
## File Locations
### Host Variables
- `/source/dlx-src/dlx-ansible/host_vars/npm.yml` - NPM firewall config
- `/source/dlx-src/dlx-ansible/host_vars/hiveops.yml` - HiveOps settings
- `/source/dlx-src/dlx-ansible/host_vars/smartjournal.yml` - SmartJournal settings
- `/source/dlx-src/dlx-ansible/host_vars/jenkins.yml` - Jenkins/SonarQube firewall config
### Playbooks Created
- `playbooks/fix-hiveops-ssl-offload.yml` - SSL offload fix automation
- `playbooks/fix-hiveops-compose-indentation.yml` - Compose file corrections
- `playbooks/fix-hiveops-mgmt-ssl.yml` - Management service SSL fix
### Templates
- `templates/hiveops-docker-compose.prod.yml.j2` - Corrected compose template
## Storage Remediation (2026-02-08)
**Critical Issues Identified**:
1. proxmox-00 root FS: 84.5% full (CRITICAL)
2. proxmox-01 dlx-docker: 81.1% full (HIGH)
3. Unused containers: 1.2 TB allocated
4. SonarQube: 354 GB (82% of allocation)
**Remediation Playbooks Created**:
- `remediate-storage-critical-issues.yml`: Log cleanup, Docker prune, audits
- `remediate-docker-storage.yml`: Deep Docker cleanup + automation
- `remediate-stopped-containers.yml`: Safe container removal with backups
- `configure-storage-monitoring.yml`: Proactive monitoring (5/10 min checks)
**Documentation**:
- `STORAGE-AUDIT.md`: Full hardware/storage analysis (550 lines)
- `STORAGE-REMEDIATION-GUIDE.md`: Step-by-step execution (480 lines)
- `REMEDIATION-SUMMARY.md`: Quick reference (300 lines)
**Expected Results**:
- Total space freed: 1-2 TB
- proxmox-00: 84.5% → 70% (10-15 GB freed)
- proxmox-01: 81.1% → 70% (50-150 GB freed)
- Automation prevents regrowth (weekly prune + hourly monitoring)
**Commit**: 90ed5c1
## Common Tasks
### Fix SSL Offloading for Spring Boot Service
1. Add env vars to .env: `SERVER_FORWARD_HEADERS_STRATEGY=native`, `SERVER_USE_FORWARD_HEADERS=true`
2. Add to docker-compose environment section
3. Recreate container: `docker stop <name> && docker rm <name> && docker compose up -d <service>`
4. Verify: Check for `Strict-Transport-Security` header
### Apply Firewall Configuration
- Firewall is managed by common role (roles/common/tasks/security.yml)
- Controlled per-host via `common_firewall_enabled` and `common_firewall_allowed_ports`
- Some hosts (docker, hiveops, smartjournal) have firewall disabled for Docker networking
### Run Storage Remediation
1. Test with `--check`: `ansible-playbook playbooks/remediate-storage-critical-issues.yml --check`
2. Deploy monitoring: `ansible-playbook playbooks/configure-storage-monitoring.yml -l proxmox`
3. Fix proxmox-00: `ansible-playbook playbooks/remediate-storage-critical-issues.yml -l proxmox-00`
4. Fix proxmox-01: `ansible-playbook playbooks/remediate-docker-storage.yml -l proxmox-01`
5. Monitor: `tail -f /var/log/storage-monitor.log`
6. Remove containers (optional): `ansible-playbook playbooks/remediate-stopped-containers.yml -e dry_run=false`
## Security Notes
- Only trust forwarded headers when backend is not internet-accessible
- NPM server (192.168.200.71) should be only server that can reach backend ports
- Backend ports should bind to localhost only: `127.0.0.1:8080:8080`

View File

@ -0,0 +1,25 @@
# CLAUDE.md - dlx-ansible Project
Infrastructure as Code for DirectLX - Ansible playbooks for managing Proxmox-based homelab.
## Infrastructure
16 servers: 3x Proxmox, 3x databases, Jenkins, Gitea, NPM, Docker host, Pi-hole, applications
## Key Commands
```bash
# Run playbooks
ansible-playbook playbooks/site.yml
ansible-playbook playbooks/security-audit-v2.yml
# Ad-hoc
ansible all -m ping
ansible all -m shell -a "uptime" -b
```
## Critical Knowledge
See `memory/dlx-ansible/MEMORY.md` for infrastructure details, fixes, and learnings.
**Last Updated**: 2026-02-09