Gets smarter with every investigation.
The agent learns correct queries, correlations, and resolution paths from past incidents. The next time, it skips the dead ends.
Watch the agent learn in real time.
The same alert fires twice. The first time, the agent explores and makes mistakes. The second time, it remembers and goes straight to the answer.
Query accuracy improvement
On the first encounter, the agent may query the wrong metric name, get an error, and retry with the corrected name. It stores this correction so that next time it queries the right metric on the first try.
get_metrics("latency.p95") ✗ not found get_metrics("trace.duration.p95") ✓ found get_metrics("trace.duration.p95") ✓ first try Correlation learning
When investigating root cause, the agent explores multiple hypotheses. After finding that a p95 spike was caused by pod memory pressure, it stores this correlation. Next time, it skips the dead ends entirely.
check recent deploys ✗ none check config changes ✗ none check pod metrics ✓ OOM 94% check deploys skipped check config skipped get_pod_metrics() ✓ confirmed What the agent learns.
Every investigation adds to a permanent, compounding knowledge base specific to your infrastructure.
Metric name corrections
Wrong metric names are mapped to correct ones, eliminating repeat query failures.
"latency.p95" → "trace.duration.p95" Symptom → root cause links
The agent stores which symptoms map to which root causes, skipping exploration.
"p95 spike" → "pod memory pressure" Effective tool sequences
Which tools in which order lead to the fastest resolution for a given alert type.
OOM → pod_metrics → container_logs → hpa_status Dead-end paths
Hypotheses that never lead to root cause are deprioritized permanently.
check deploys – 0/4 investigations Compounding speed.
Every investigation makes the next one faster. The improvements are cumulative and permanent.
See the self-learning agent, in action.
Connect your stack and watch the agent get smarter with every investigation.