374 lines
9.7 KiB
Markdown
374 lines
9.7 KiB
Markdown
# CLAUDE.md - dlx-ansible
|
|
|
|
Infrastructure as Code for DirectLX - Ansible playbooks, roles, and inventory for managing a Proxmox-based homelab infrastructure with multiple services.
|
|
|
|
## Project Overview
|
|
|
|
This repository manages 16 servers across Proxmox hypervisors, databases, web services, infrastructure services, and applications using Ansible automation.
|
|
|
|
## Infrastructure
|
|
|
|
### Server Inventory
|
|
|
|
**Proxmox Cluster**:
|
|
- proxmox-00 (192.168.200.10) - Primary hypervisor
|
|
- proxmox-01 (192.168.200.11) - Secondary hypervisor
|
|
- proxmox-02 (192.168.200.12) - Tertiary hypervisor
|
|
|
|
**Database Servers**:
|
|
- postgres (192.168.200.103) - PostgreSQL database
|
|
- mysql (192.168.200.110) - MySQL/MariaDB database
|
|
- mongo (192.168.200.111) - MongoDB database
|
|
|
|
**Web/Proxy Servers**:
|
|
- nginx (192.168.200.65) - Web server
|
|
- npm (192.168.200.71) - Nginx Proxy Manager for SSL termination
|
|
|
|
**Infrastructure Services**:
|
|
- docker (192.168.200.200) - Docker host for various containerized services
|
|
- pihole (192.168.200.100) - DNS server and ad-blocking
|
|
- gitea (192.168.200.102) - Self-hosted Git service
|
|
- jenkins (192.168.200.91) - CI/CD server + SonarQube
|
|
|
|
**Application Servers**:
|
|
- hiveops (192.168.200.112) - HiveOps incident management (Spring Boot)
|
|
- smartjournal (192.168.200.114) - Journal tracking application
|
|
- odoo (192.168.200.61) - ERP system
|
|
|
|
**Control**:
|
|
- ansible-node (192.168.200.106) - Ansible control node
|
|
|
|
### Common Access Patterns
|
|
|
|
- **User**: dlxadmin (passwordless sudo on all servers)
|
|
- **SSH**: Key-based authentication (password disabled on most servers)
|
|
- **Exception**: Jenkins server has password auth enabled for AWS Jenkins Master connection
|
|
- **Firewall**: UFW managed via common role
|
|
|
|
## Quick Start Commands
|
|
|
|
### Basic Ansible Operations
|
|
|
|
```bash
|
|
# Check connectivity to all servers
|
|
ansible all -m ping
|
|
|
|
# Check connectivity to specific group
|
|
ansible webservers -m ping
|
|
|
|
# Run ad-hoc command
|
|
ansible all -m shell -a "uptime" -b
|
|
|
|
# Gather facts about servers
|
|
ansible all -m setup
|
|
```
|
|
|
|
### Playbook Execution
|
|
|
|
```bash
|
|
# Run main site playbook
|
|
ansible-playbook playbooks/site.yml
|
|
|
|
# Limit to specific servers
|
|
ansible-playbook playbooks/site.yml -l jenkins,npm
|
|
|
|
# Limit to server group
|
|
ansible-playbook playbooks/site.yml -l webservers
|
|
|
|
# Use tags
|
|
ansible-playbook playbooks/site.yml --tags firewall
|
|
|
|
# Dry run (check mode)
|
|
ansible-playbook playbooks/site.yml --check
|
|
|
|
# Verbose output
|
|
ansible-playbook playbooks/site.yml -v
|
|
ansible-playbook playbooks/site.yml -vvv # very verbose
|
|
```
|
|
|
|
### Security Operations
|
|
|
|
```bash
|
|
# Run comprehensive security audit
|
|
ansible-playbook playbooks/security-audit-v2.yml
|
|
|
|
# View audit results
|
|
cat /tmp/security-audit-*/report.txt
|
|
cat docs/SECURITY-AUDIT-SUMMARY.md
|
|
|
|
# Apply security updates
|
|
ansible all -m apt -a "update_cache=yes upgrade=dist" -b
|
|
|
|
# Check firewall status
|
|
ansible all -m shell -a "ufw status verbose" -b
|
|
|
|
# Configure Docker server firewall (when ready)
|
|
ansible-playbook playbooks/secure-docker-server-firewall.yml
|
|
```
|
|
|
|
### Server Management
|
|
|
|
```bash
|
|
# Reboot servers
|
|
ansible all -m reboot -b
|
|
|
|
# Check disk space
|
|
ansible all -m shell -a "df -h" -b
|
|
|
|
# Check memory usage
|
|
ansible all -m shell -a "free -h" -b
|
|
|
|
# Check running services
|
|
ansible all -m shell -a "systemctl status" -b
|
|
|
|
# Update packages
|
|
ansible all -m apt -a "update_cache=yes" -b
|
|
```
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
dlx-ansible/
|
|
├── inventory/
|
|
│ └── hosts.yml # Server inventory with IPs and groups
|
|
│
|
|
├── host_vars/ # Per-host configuration
|
|
│ ├── jenkins.yml # Jenkins-specific vars (firewall ports)
|
|
│ ├── npm.yml # NPM firewall configuration
|
|
│ ├── hiveops.yml # HiveOps settings
|
|
│ └── ...
|
|
│
|
|
├── group_vars/ # Per-group configuration
|
|
│
|
|
├── roles/ # Ansible roles
|
|
│ └── common/ # Common configuration for all servers
|
|
│ ├── tasks/
|
|
│ │ ├── main.yml
|
|
│ │ ├── packages.yml
|
|
│ │ ├── security.yml # Firewall, SSH hardening
|
|
│ │ ├── users.yml
|
|
│ │ └── timezone.yml
|
|
│ └── defaults/
|
|
│ └── main.yml # Default variables
|
|
│
|
|
├── playbooks/ # Ansible playbooks
|
|
│ ├── site.yml # Main playbook (includes all roles)
|
|
│ ├── security-audit-v2.yml # Security audit
|
|
│ ├── secure-docker-server-firewall.yml
|
|
│ └── ...
|
|
│
|
|
├── templates/ # Jinja2 templates
|
|
│
|
|
└── docs/ # Documentation
|
|
├── SECURITY-AUDIT-SUMMARY.md
|
|
├── JENKINS-CONNECTIVITY-FIX.md
|
|
└── ...
|
|
```
|
|
|
|
## Key Configuration Patterns
|
|
|
|
### Firewall Management
|
|
|
|
Firewall is managed by the common role. Configuration is per-host in `host_vars/`:
|
|
|
|
```yaml
|
|
# Example: host_vars/jenkins.yml
|
|
common_firewall_enabled: true
|
|
common_firewall_allowed_ports:
|
|
- "22/tcp" # SSH
|
|
- "8080/tcp" # Jenkins
|
|
- "9000/tcp" # SonarQube
|
|
```
|
|
|
|
**Firewall Disabled Hosts**:
|
|
- docker, hiveops, smartjournal, odoo (disabled for Docker networking)
|
|
|
|
### SSH Configuration
|
|
|
|
Most servers use key-only authentication:
|
|
```yaml
|
|
PasswordAuthentication no
|
|
PubkeyAuthentication yes
|
|
PermitRootLogin no # (except Proxmox nodes)
|
|
```
|
|
|
|
**Exception**: Jenkins has password authentication enabled for AWS Jenkins Master.
|
|
|
|
### Spring Boot SSL Offloading
|
|
|
|
For Spring Boot applications behind Nginx Proxy Manager:
|
|
|
|
```yaml
|
|
environment:
|
|
SERVER_FORWARD_HEADERS_STRATEGY: native
|
|
SERVER_USE_FORWARD_HEADERS: true
|
|
```
|
|
|
|
This prevents redirect loops when NPM terminates SSL.
|
|
|
|
### Docker Compose
|
|
|
|
When .env is not in same directory as compose file:
|
|
```bash
|
|
docker compose -f docker/docker-compose.yml --env-file .env up -d
|
|
```
|
|
|
|
**Container updates**: Always recreate (not restart) when changing environment variables.
|
|
|
|
## Critical Knowledge
|
|
|
|
See `~/.claude/projects/-source-dlx-src-dlx-ansible/memory/MEMORY.md` for detailed infrastructure knowledge including:
|
|
|
|
- SSL offloading configuration
|
|
- Jenkins connectivity troubleshooting
|
|
- Storage remediation procedures
|
|
- Security audit findings
|
|
- Common fixes and solutions
|
|
|
|
## Common Tasks
|
|
|
|
### Add New Server
|
|
|
|
1. Add to `inventory/hosts.yml`:
|
|
```yaml
|
|
newserver:
|
|
ansible_host: 192.168.200.xxx
|
|
```
|
|
|
|
2. Create `host_vars/newserver.yml` (if custom config needed)
|
|
|
|
3. Run setup:
|
|
```bash
|
|
ansible-playbook playbooks/site.yml -l newserver
|
|
```
|
|
|
|
### Update Firewall Rules
|
|
|
|
1. Edit `host_vars/<server>.yml`:
|
|
```yaml
|
|
common_firewall_allowed_ports:
|
|
- "22/tcp"
|
|
- "80/tcp"
|
|
- "443/tcp"
|
|
```
|
|
|
|
2. Apply changes:
|
|
```bash
|
|
ansible-playbook playbooks/site.yml -l <server> --tags firewall
|
|
```
|
|
|
|
### Enable Automatic Security Updates
|
|
|
|
```bash
|
|
ansible all -m apt -a "name=unattended-upgrades state=present" -b
|
|
ansible all -m copy -a "dest=/etc/apt/apt.conf.d/20auto-upgrades content='APT::Periodic::Update-Package-Lists \"1\";\nAPT::Periodic::Unattended-Upgrade \"1\";' mode=0644" -b
|
|
```
|
|
|
|
### Run Monthly Security Audit
|
|
|
|
```bash
|
|
ansible-playbook playbooks/security-audit-v2.yml
|
|
cat docs/SECURITY-AUDIT-SUMMARY.md
|
|
```
|
|
|
|
## Git Workflow
|
|
|
|
- **Main Branch**: Production-ready configurations
|
|
- **Commit Messages**: Descriptive, include what was changed and why
|
|
- **Co-Authored-By**: Include for Claude-assisted work
|
|
- **Testing**: Always test with `--check` before applying changes
|
|
|
|
Example commit:
|
|
```bash
|
|
git add playbooks/new-playbook.yml
|
|
git commit -m "Add playbook for X configuration
|
|
|
|
This playbook automates Y to solve Z problem.
|
|
|
|
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### SSH Connection Issues
|
|
|
|
```bash
|
|
# Test SSH connectivity
|
|
ansible <server> -m ping
|
|
|
|
# Check SSH with verbose output
|
|
ssh -vvv dlxadmin@<server-ip>
|
|
|
|
# Test from control machine
|
|
ansible <server> -m shell -a "whoami" -b
|
|
```
|
|
|
|
### Firewall Issues
|
|
|
|
```bash
|
|
# Check firewall status
|
|
ansible <server> -m shell -a "ufw status verbose" -b
|
|
|
|
# Temporarily disable (for debugging)
|
|
ansible <server> -m ufw -a "state=disabled" -b
|
|
|
|
# Re-enable
|
|
ansible <server> -m ufw -a "state=enabled" -b
|
|
```
|
|
|
|
### Playbook Failures
|
|
|
|
```bash
|
|
# Run with verbose output
|
|
ansible-playbook playbooks/site.yml -vvv
|
|
|
|
# Check syntax
|
|
ansible-playbook playbooks/site.yml --syntax-check
|
|
|
|
# List tasks
|
|
ansible-playbook playbooks/site.yml --list-tasks
|
|
|
|
# Start at specific task
|
|
ansible-playbook playbooks/site.yml --start-at-task="task name"
|
|
```
|
|
|
|
## Security Best Practices
|
|
|
|
1. **Always test with --check first**
|
|
2. **Limit scope with -l when testing**
|
|
3. **Keep firewall rules minimal**
|
|
4. **Use key-based SSH authentication**
|
|
5. **Enable automatic security updates**
|
|
6. **Run monthly security audits**
|
|
7. **Document changes in memory**
|
|
8. **Never commit secrets** (use Ansible Vault when needed)
|
|
|
|
## Important Notes
|
|
|
|
- Jenkins password auth is intentional (for AWS Jenkins Master access)
|
|
- Firewall disabled on hiveops/smartjournal/odoo for Docker networking
|
|
- Proxmox nodes may require root login for management
|
|
- NPM server (192.168.200.71) handles SSL termination for web services
|
|
- Pi-hole (192.168.200.100) provides DNS for internal services
|
|
|
|
## Resources
|
|
|
|
- **Documentation**: `docs/` directory
|
|
- **Security Audit**: `docs/SECURITY-AUDIT-SUMMARY.md`
|
|
- **Claude Memory**: `~/.claude/projects/-source-dlx-src-dlx-ansible/memory/MEMORY.md`
|
|
- **Version Controlled Config**: http://192.168.200.102/directlx/dlx-claude
|
|
|
|
## Maintenance Schedule
|
|
|
|
- **Daily**: Monitor server health, check failed logins
|
|
- **Weekly**: Review and apply security updates
|
|
- **Monthly**: Run security audit, review firewall rules
|
|
- **Quarterly**: Review and update documentation
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-02-09
|
|
**Repository**: http://192.168.200.102/directlx/dlx-ansible (Gitea)
|
|
**Claude Memory**: Maintained in ~/.claude/projects/
|
|
**Version Controlled**: http://192.168.200.102/directlx/dlx-claude
|