dlx-ansible/docs/KAFKA-LOCALHOST-FIX.md

107 lines
4.0 KiB
Markdown

# Kafka Admin Client `localhost:9092` Warning Fix
## Symptom
During `sj_api` (Spring Boot) startup, the following warnings appear repeatedly:
```
WARN [kafka-admin-client-thread | smart-api-admin-0]
Connection to node -1 (localhost/127.0.0.1:9092) could not be established.
Node may not be available.
```
The application eventually starts successfully but takes ~60 seconds due to retry loops.
## Root Cause
Two separate issues compound each other:
### 1. Kafka has two listeners — services were using the wrong one
`services/kafka.yaml` defines:
```yaml
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,EXTERNAL_LISTENER://192.168.200.114:9092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:29092,EXTERNAL_LISTENER://0.0.0.0:9092
```
- `PLAINTEXT://kafka:29092` — internal Docker network (for container-to-container)
- `EXTERNAL_LISTENER://192.168.200.114:9092` — external host access (for outside Docker)
The `.env` had `kafkaservice=kafka:9092`, which connects to the **external** listener.
When a container connects via the external listener, Kafka returns metadata advertising
`192.168.200.114:9092` as the broker address. From inside a container, this routes back
through the host and causes connection confusion, including resolving to `localhost`.
**Fix:** Change `.env` to use the internal PLAINTEXT listener:
```
kafkaservice=kafka:29092
```
### 2. Spring Boot `dev` profile hardcodes `localhost:9092` for the Kafka admin client
The application jar's `application-dev.yml` has `localhost:9092` as the default Kafka
bootstrap server. The `KAFKASERVICE` env var only overrides the producer/consumer
clients — the Spring Kafka admin client reads from `spring.kafka.bootstrap-servers`
which was still falling back to the dev profile's `localhost:9092`.
**Fix:** Add `SPRING_KAFKA_BOOTSTRAP_SERVERS` to the api service environment in
`docker-compose-prod.yaml`, pointing at the same value as `KAFKASERVICE`:
```yaml
environment:
- KAFKASERVICE=${kafkaservice}
- SPRING_KAFKA_BOOTSTRAP_SERVERS=${kafkaservice} # <-- add this
```
This overrides the dev profile default for the admin client at container startup.
## Files Changed
| File | Change |
|---|---|
| `/opt/smartjournal/.env` | `kafkaservice=kafka:9092``kafkaservice=kafka:29092` |
| `/opt/smartjournal/docker-compose-prod.yaml` | Added `SPRING_KAFKA_BOOTSTRAP_SERVERS=${kafkaservice}` to `api` service environment |
## Result
- No more `localhost:9092` warnings
- Startup time: ~60 seconds → ~20 seconds
## Applying to Another Environment
1. **Check Kafka listeners** — ensure the internal listener (PLAINTEXT) is on a different
port from the external listener and that `kafkaservice` in `.env` points to the internal one:
```
kafkaservice=kafka:<internal-port>
```
2. **Add the Spring override** to the api service in `docker-compose-prod.yaml`:
```yaml
- SPRING_KAFKA_BOOTSTRAP_SERVERS=${kafkaservice}
```
3. **Recreate the api container** (restart is not sufficient — env vars require recreate):
```bash
docker compose -f docker-compose-prod.yaml up -d --force-recreate api
```
4. **Verify** — startup should complete in ~20 seconds with no `localhost` warnings:
```bash
docker logs sj_api 2>&1 | grep -E 'localhost.*9092|Started UiApplication'
```
Expected: only the `Started UiApplication in XX seconds` line, no localhost warnings.
## Related Issues Found During This Session
- `mfa_enabled=fasle` typo in `.env` — caused `Invalid boolean value` startup crash.
Fixed by correcting to `mfa_enabled=false`.
- Duplicate env vars with hyphens vs underscores in `docker-compose-prod.yaml`:
```yaml
- SAML-MAPPER-GRAPH-PROXY-PORT=${saml-mapper-graph-proxy-port} # broken (hyphen)
- SAML-MAPPER-GRAPH-PROXY-PORT=${saml_mapper_graph_proxy_port} # correct (underscore)
```
Shell interprets `${saml-mapper-graph-proxy-port}` as `${saml}` with default
`mapper-graph-proxy-port`, so the port env var receives a string instead of an integer,
crashing Spring Boot. Fixed by removing the hyphenated duplicate lines.