directlx-claude-config/plans/cheeky-meandering-kazoo.md

487 lines
21 KiB
Markdown

# Application Log Error Monitoring Module - Implementation Plan
## Context
The user needs to monitor ATM application log files (APLog*.log) for ERROR-level entries and report them to the hiveops-incident backend. Additionally, errors should be correlated with transactions when possible to provide context for troubleshooting.
**Why this is needed**: Application errors often indicate software issues that can lead to transaction failures or service degradation. By capturing and reporting these errors to the incident management system, operators can proactively identify and resolve problems before they impact customers.
**Current state**: The system already monitors journal files for hardware events (card reader failures, cassette issues, etc.) via the journal-events module. However, application-level software errors are not captured.
**File encodings**:
- **Application logs** (`examples/20260215_APP/APLog20260215.log`): UTF-8/ASCII encoding, straightforward to parse
- **Device journals** (`examples/20260215_EJ/ej_BP000125_20260215.txt`): UTF-16 Little Endian with BOM, requires UTF16LEReader
- **Server journals** (`examples/20260215_EJ/20260215.jrn`): UTF-8 (server-processed), not used by agent
**Important path structure**:
- Application log files are organized in **date-based subdirectories**: `d:\MoniPlus2SLog\20260215\APLog20260215.log`
- Each day creates a new folder (e.g., `20260215`, `20260216`, etc.)
- The AppLogSource must navigate to the correct date folder to find the current log file
## Approach
Create a new module called **app-log-events** within the existing `hiveops-journal` Maven module. This module will:
1. Monitor application log files for ERROR entries
2. Parse and categorize errors by type (card, dispenser, network, encryption, etc.)
3. Optionally correlate errors with transactions using timestamp proximity
4. Send structured error events to the atm-incident backend
5. Track file position to avoid reprocessing on restart
**Design decision**: Build within the existing hiveops-journal module (rather than creating a new Maven module) because it shares the same destination backend, HTTP client infrastructure, and conceptual domain (time-series ATM operational data).
## Implementation Steps
### 1. Add New Event Types (Agent Side)
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/EventType.java`
Add application error event types after line 47:
```java
// Application Error Events
APPLICATION_ERROR, // Generic application error
APPLICATION_ERROR_CARD, // Card-related application error
APPLICATION_ERROR_DISPENSER, // Dispenser-related error
APPLICATION_ERROR_NETWORK, // Network-related error
APPLICATION_ERROR_ENCRYPTION, // Encryption/security error
APPLICATION_ERROR_JOURNAL // Journal upload error
```
### 2. Create AppLogSource Class
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogSource.java` (new)
Similar to `JournalSource` but specialized for application logs with **date-based subdirectories**:
```java
package com.hiveops.applogs;
import java.io.File;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
public class AppLogSource {
private final String baseDir; // e.g., "d:\MoniPlus2SLog"
private final String filenameFormat; // "APLog{YYYY}{MM}{DD}.log"
private final String atmName;
public AppLogSource(String baseDir, String filenameFormat, String atmName) {
this.baseDir = baseDir;
this.filenameFormat = filenameFormat;
this.atmName = atmName;
}
public File getCurrentLogFile() {
String date = LocalDate.now().format(DateTimeFormatter.ofPattern("yyyyMMdd"));
// Application logs are in date-based subdirectories
// e.g., d:\MoniPlus2SLog\20260215\APLog20260215.log
File dateDir = new File(baseDir, date);
String filename = filenameFormat
.replace("{YYYY}", date.substring(0, 4))
.replace("{MM}", date.substring(4, 6))
.replace("{DD}", date.substring(6, 8));
return new File(dateDir, filename);
}
// Getters...
}
```
**Pattern**: Extends JournalSource pattern with date-based subdirectory navigation.
### 3. Create AppLogParser Interface and Implementation
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogParser.java` (new)
```java
package com.hiveops.applogs;
import com.hiveops.events.dto.CreateJournalEventRequest;
import java.util.List;
public interface AppLogParser {
List<CreateJournalEventRequest> parseLine(String line, String agentAtmId);
}
```
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/SimpleAppLogParser.java` (new)
Key responsibilities:
- Parse log line format: `ERROR [YYYY-MM-DD HH:MM:SS-mmm] [Class.Method] Message`
- Extract timestamp, class/method, and message
- Categorize error using regex patterns (configurable via properties)
- Build CreateJournalEventRequest with eventSource="HIVEOPS_AGENT_APPLOG"
**Pattern to reuse**: Follow SimpleJournalEventParser.java structure with regex-based pattern matching and configurable patterns via properties.
**Example categorization**:
```java
private EventType categorizeError(String className, String method, String message) {
String combined = (className + "." + method + " " + message).toLowerCase();
if (cardErrorPattern.matcher(combined).find())
return EventType.APPLICATION_ERROR_CARD;
if (dispenserErrorPattern.matcher(combined).find())
return EventType.APPLICATION_ERROR_DISPENSER;
if (networkErrorPattern.matcher(combined).find())
return EventType.APPLICATION_ERROR_NETWORK;
if (encryptionErrorPattern.matcher(combined).find())
return EventType.APPLICATION_ERROR_ENCRYPTION;
if (journalErrorPattern.matcher(combined).find())
return EventType.APPLICATION_ERROR_JOURNAL;
return EventType.APPLICATION_ERROR; // default
}
```
**Default patterns** (configurable via properties):
- Card: `cardreader|idc|chip.*error|card.*(fail|jam|stuck)`
- Dispenser: `cashdispenser|brm|dispens.*error|cash.*jam`
- Network: `network|connection|socket|tcp.*error`
- Encryption: `encrypt|decrypt|certificate|crypto`
- Journal: `ejournaluploader|journal.*upload`
### 4. Create Transaction Correlator (Optional)
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/TransactionCorrelator.java` (new)
Only create this if `applog.events.correlation.enabled=true`. Responsibilities:
- Load recent transactions from **device journal files** (UTF-16LE encoded, e.g., `ej_BP000125_20260215.txt`)
- Maintain in-memory timeline using circular buffer
- Find nearest transaction for a given error timestamp (within 30-second window)
- Enrich event details with transaction context
**Important**: Device journal files use UTF-16 Little Endian encoding with BOM (`ff fe`). Use the existing `UTF16LEReader` class from `/source/hiveops-src/hiveops-agent/hiveops-core/src/main/java/com/hiveops/http/UTF16LEReader.java` to read these files properly.
**Data structure**:
```java
class TransactionContext {
LocalDateTime timestamp;
String sequenceNumber;
String transactionType;
}
```
**Pattern**: Use Apache Commons CircularFifoQueue for memory-efficient transaction history.
**Journal parsing strategy**:
```java
// Read device journal file with UTF-16LE encoding
JournalSource journalSource = findJournalSource(context);
File journalFile = journalSource.getCurrentJournalFile();
UTF16LEReader reader = new UTF16LEReader();
// Parse lines for transaction markers
Pattern txnStartPattern = Pattern.compile("\\[.*?\\]TRANSACTION START");
Pattern txnSeqPattern = Pattern.compile("Trans SEQ Number \\[(\\d+)\\]");
```
**Enrichment example**:
```
Original: "ERROR [2026-02-15 00:02:37-678] [Encryption.EncryptString] Error found while encrypting"
Enriched: "[Transaction 4727 @ 2026-02-15T00:02:10] ERROR [2026-02-15 00:02:37-678] [Encryption.EncryptString] Error found while encrypting"
```
### 5. Create AppLogEventProcessor
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogEventProcessor.java` (new)
Background thread that:
1. Monitors app log file for changes (polling-based, similar to JournalEventProcessor)
2. Reads new content from last byte offset using RandomAccessFile
3. Parses ERROR lines using AppLogParser
4. Optionally correlates with transactions
5. Batches events and sends via IncidentEventClient
6. Persists position to survive restarts
**Position file**: `{applog.dir}/{atmName}-applogs.position`
```properties
filename=APLog20260216.log
position=1245678
lastProcessedTime=2026-02-16T10:35:22.456
```
**Pattern to reuse**: Copy the structure of JournalEventProcessor.java, adapting for line-by-line processing instead of chunk uploading.
**Key loop structure**:
```java
while (running) {
File currentFile = source.getCurrentLogFile();
ProcessingState state = loadState();
// Read new content from last position
List<String> newLines = readNewLines(currentFile, state.position);
List<CreateJournalEventRequest> events = new ArrayList<>();
for (String line : newLines) {
events.addAll(parser.parseLine(line, atmName));
}
// Send batch
if (!events.isEmpty()) {
eventClient.sendEvents(events);
saveState(state);
}
Thread.sleep(recheckDelayMs);
}
```
### 6. Create AppLogEventModule
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogEventModule.java` (new)
AgentModule implementation for lifecycle management:
```java
package com.hiveops.applogs;
import com.hiveops.core.module.AgentModule;
import com.hiveops.core.module.ModuleContext;
import com.hiveops.core.module.ModuleInitializationException;
import com.hiveops.events.IncidentEventClient;
import com.hiveops.http.HttpClientSettings;
public class AppLogEventModule implements AgentModule {
private AppLogEventProcessor processor;
private Thread thread;
@Override
public String getName() { return "app-log-events"; }
@Override
public String getVersion() { return "1.0.0"; }
@Override
public List<String> getDependencies() {
// Depend on journal-upload for transaction correlation
return Arrays.asList("journal-upload", "journal-events");
}
@Override
public void initialize(ModuleContext context) throws ModuleInitializationException {
Properties props = context.getMainProperties();
// Check if enabled
boolean enabled = Boolean.parseBoolean(
props.getProperty("applog.events.enabled", "true"));
String incidentEndpoint = props.getProperty("incident.endpoint");
if (!enabled || incidentEndpoint == null) {
return; // isEnabled() will return false
}
// Load configuration
String logDir = props.getProperty("applog.events.dir");
String filenameFormat = props.getProperty("applog.events.filename.format",
"APLog{YYYY}{MM}{DD}.log");
// Create components
AppLogSource source = new AppLogSource(logDir, filenameFormat,
context.getAtmName());
HttpClientSettings settings = new HttpClientSettings();
settings.setEndpoint(incidentEndpoint);
IncidentEventClient client = new IncidentEventClient(settings,
context.getAtmName(),
context.getCountry());
AppLogParser parser = new SimpleAppLogParser(props);
// Optional: create correlator if enabled
TransactionCorrelator correlator = null;
if (Boolean.parseBoolean(props.getProperty("applog.events.correlation.enabled", "false"))) {
JournalSource journalSource = findJournalSource(context);
if (journalSource != null) {
correlator = new TransactionCorrelator(journalSource, 30, 1000);
}
}
long recheckDelay = Long.parseLong(
props.getProperty("applog.events.recheck.delay.msec", "5000"));
int batchSize = Integer.parseInt(
props.getProperty("applog.events.batch.size", "50"));
processor = new AppLogEventProcessor(source, client, parser, correlator,
recheckDelay, batchSize,
context.getAtmName());
}
@Override
public boolean isEnabled(ModuleContext context) {
String endpoint = context.getMainProperties().getProperty("incident.endpoint");
boolean enabled = Boolean.parseBoolean(
context.getMainProperties().getProperty("applog.events.enabled", "true"));
return endpoint != null && enabled && processor != null;
}
@Override
public void start() {
if (processor != null) {
thread = new Thread(processor, "app-log-events");
thread.start();
}
}
@Override
public void stop() {
if (processor != null) processor.stop();
if (thread != null) thread.interrupt();
}
}
```
**Pattern**: Follow the exact structure of JournalEventModule.java.
### 7. Register Module via ServiceLoader
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/resources/META-INF/services/com.hiveops.core.module.AgentModule`
Add the new module to the existing file:
```
com.hiveops.journals.JournalUploadModule
com.hiveops.events.JournalEventModule
com.hiveops.applogs.AppLogEventModule
```
### 8. Add Configuration Properties
**File**: `/source/hiveops-src/hiveops-agent/hiveops-app/src/main/resources/hiveops.properties`
Add configuration section:
```properties
# ATM Incident integration endpoint
incident.endpoint=https://incident.bcos.cloud
# Application Log Error Monitoring
applog.events.enabled=true
applog.events.dir=d:\\MoniPlus2SLog
applog.events.filename.format=APLog{YYYY}{MM}{DD}.log
applog.events.recheck.delay.msec=5000
applog.events.batch.size=50
# Transaction correlation (optional)
applog.events.correlation.enabled=true
applog.events.correlation.window.sec=30
applog.events.correlation.max.transactions=1000
# Pattern overrides (optional)
#applog.pattern.APPLICATION_ERROR_CARD=cardreader|idc|chip.*error
#applog.pattern.APPLICATION_ERROR_DISPENSER=cashdispenser|brm|dispens.*error
```
**Note**: `applog.events.dir` is the base directory. The module automatically navigates to date-based subdirectories (e.g., `d:\MoniPlus2SLog\20260215\`)
### 9. Write Tests
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/test/java/com/hiveops/applogs/SimpleAppLogParserTest.java` (new)
Test cases:
- Parse ERROR line with valid format
- Extract timestamp, class, method, message
- Categorize errors by pattern (card, dispenser, network, etc.)
- Handle malformed lines gracefully
- Test custom pattern configuration
- Test multiple ERROR types in different lines
**Test data** (use the example files in `examples/20260215_APP/`):
```java
@Test
public void testParseCardError() {
String line = "ERROR [2026-02-15 00:04:36-165] [CardReadState.OnAsyncCmdCompMsg] Card Accepting was failed with ERROR";
List<CreateJournalEventRequest> events = parser.parseLine(line, "DLX001");
assertEquals(1, events.size());
assertEquals("APPLICATION_ERROR_CARD", events.get(0).getEventType());
assertEquals("HIVEOPS_AGENT_APPLOG", events.get(0).getEventSource());
}
```
**File**: `/source/hiveops-src/hiveops-agent/hiveops-journal/src/test/java/com/hiveops/applogs/TransactionCorrelatorTest.java` (new)
Test transaction correlation logic (if implementing correlator).
## Critical Files to Modify/Create
### New Files (Create):
1. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogSource.java`
2. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogParser.java`
3. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/SimpleAppLogParser.java`
4. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogEventProcessor.java`
5. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/AppLogEventModule.java`
6. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/applogs/TransactionCorrelator.java` (optional)
7. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/test/java/com/hiveops/applogs/SimpleAppLogParserTest.java`
### Existing Files to Modify:
1. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/EventType.java` - Add APPLICATION_ERROR_* enum values
2. `/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/resources/META-INF/services/com.hiveops.core.module.AgentModule` - Register AppLogEventModule
3. `/source/hiveops-src/hiveops-agent/hiveops-app/src/main/resources/hiveops.properties` - Add configuration properties
## Reusable Existing Functions/Utilities
1. **IncidentEventClient** (`/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/IncidentEventClient.java`) - HTTP client for sending events to atm-incident backend (no changes needed)
2. **CreateJournalEventRequest** (`/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/dto/CreateJournalEventRequest.java`) - DTO for event payloads (already supports all needed fields)
3. **MonitoredLogFile** (`/source/hiveops-src/hiveops-agent/hiveops-core/src/main/java/com/hiveops/http/MonitoredLogFile.java`) - File monitoring with offset tracking (can be adapted for line-based reading)
4. **UTF16LEReader** (`/source/hiveops-src/hiveops-agent/hiveops-core/src/main/java/com/hiveops/http/UTF16LEReader.java`) - **CRITICAL**: Use this to read device journal files which are UTF-16LE encoded with BOM. Required for transaction correlation.
5. **JournalSource** (`/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/journals/JournalSource.java`) - Pattern for AppLogSource (note: device journals are `ej_*.txt` files in UTF-16LE, different from server journals)
6. **SimpleJournalEventParser** (`/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/SimpleJournalEventParser.java`) - Pattern for SimpleAppLogParser
7. **JournalEventProcessor** (`/source/hiveops-src/hiveops-agent/hiveops-journal/src/main/java/com/hiveops/events/JournalEventProcessor.java`) - Pattern for AppLogEventProcessor
## Verification Plan
### Unit Testing
```bash
# Run tests for the new parser
mvn test -Dtest=SimpleAppLogParserTest
# Run all app-log tests
mvn test -pl hiveops-journal -Dtest="com.hiveops.applogs.*"
```
### Integration Testing
1. Build the fat JAR: `mvn clean package -DskipTests`
2. Create test directory structure:
```
mkdir -p d:\MoniPlus2SLog\20260215
cp examples/20260215_APP/APLog20260215.log d:\MoniPlus2SLog\20260215\
```
3. Configure `hiveops.properties` with:
- `applog.events.enabled=true`
- `applog.events.dir=d:\\MoniPlus2SLog`
- `incident.endpoint=https://incident.bcos.cloud`
4. Run the agent: `java -jar hiveops-app/target/hiveops-*-jar-with-dependencies.jar`
5. Verify in logs:
- "Started app log event processor thread"
- "Processing X bytes from d:\MoniPlus2SLog\20260215\APLog20260215.log"
- "Sending Y events to incident backend"
6. Check atm-incident backend (incident.bcos.cloud) for received APPLICATION_ERROR_* events
7. Verify transaction correlation (if enabled) shows transaction sequence numbers
### Manual Verification
1. Monitor a real application log file with live ERROR entries
2. Verify events appear in atm-incident dashboard
3. Check that file position persists across agent restarts
4. Verify no reprocessing of old errors after restart
5. Test log file rotation at midnight (filename changes from APLog20260215.log to APLog20260216.log)
## Notes
- The module is disabled by default if `incident.endpoint` is not configured
- Transaction correlation is optional and can be disabled via `applog.events.correlation.enabled=false`
- Error categorization patterns are configurable via properties for different ATM software versions
- The module shares the same incident backend endpoint as journal-events
- Position tracking ensures no duplicate error reporting across restarts
- File I/O is minimal (only reads new content incrementally)