Saturday, April 25, 2026

From AIOps to Agentic Systems: Why Monitoring Is Not Enough (and Never Was)

For years, the industry has been obsessed with observability.

Dashboards. Alerts. Correlations.
Then came AIOps — promising intelligence on top.

But let’s be honest:

Most AIOps tools today are still just better dashboards.

They detect problems.
Sometimes they explain them.
But very rarely do they fix anything.

For mainframe environments, this gap is even more important. IBM Z systems still run many of the enterprise’s most critical transaction workloads, where CPU, memory, I/O, service classes, batch windows, and subsystem behavior interact in complex ways. AI on the mainframe is not only about adding assistants or anomaly detection. The real opportunity is to combine trusted workload models, mainframe operational context, and governed automation so the platform can recommend — and eventually execute — safe actions before service levels are at risk.

The Missing Step: Action

Across my (with Capital One and 2 other co-authors) patent family:

US10437697 (2016)
US11243863 (2019)
US12007869 (2021)

there is a deliberate progression:


[Workload] → [Model] → [Insight] → [Action]

Most systems today stop here:


[Workload] → [Model] → [Insight]   ❌

The real value starts here:


[Workload] → [Model] → [Insight] → [Action]   ✅

Step 1 — Modeling the System (US10437697)

The first patent introduced a core idea:

Model how business activity (transactions) drives system resources (CPU, memory, I/O).

Not thresholds.
Not heuristics.
But statistical relationships.


Transactions ───► CPU / Memory / I/O
           (modeled mathematically)

This was already a shift from traditional monitoring.

Step 2 — Adding Context (US11243863)

The second patent introduced interaction types:

Different workloads behave differently — so model them separately.


Mobile ─┐
Web     ├──► Separate models ───► Better decisions
ATM     ┘

This aligns with what the industry now calls:

service-level observability
topology-aware analysis

Step 3 — Acting on the Model (US12007869)

This is the key leap.

The latest patent moves beyond analysis:

Use the models to automatically reconfigure the system.


Before:
Workload ───► Overloaded Node

After:
Workload ───► Optimal Node
          (automatically reassigned)

Or more formally:


[Model] → Decision → Remap workloads → Optimize system

This is no longer monitoring.

This is autonomous control.

Why This Matters Now (Agentic AI)

Everyone is talking about:

AI agents
autonomous systems
self-healing infrastructure

But here’s the uncomfortable truth:

You can’t have agentic systems without reliable system models.

LLMs don’t understand system dynamics.
They generate text — not operational decisions.

What you need is:


Statistical Models (US10437697)
+ Context Segmentation (US11243863)
+ Autonomous Action (US12007869)

Which leads to:


→ Agentic AIOps

The Real Gap in AIOps Today

Platforms like:

Datadog
Dynatrace
New Relic

are very good at:

✔ Detecting anomalies
✔ Explaining root causes

But still weak at:

❌ Acting autonomously
❌ Continuously optimizing systems

My Take (Provocative Version)

AIOps without action is just observability with better marketing.

The real transition is:


Monitoring → AIOps → Autonomous Systems → Agentic AI Ops

And the key step is exactly what US12007869 enables:

Systems that don’t just understand —
but act based on that understanding.

Final Thought

If your system still depends on humans to:

interpret alerts
decide what to do
execute changes

Then it’s not AIOps.

It’s just monitoring — with extra steps.

______________

Reference:

My CMG presentation about the subject: https://cmg.org/wp-content/plugins/s2member-files/proceedings/2017/362_Trubin.pdf

___________________________________________

Disclaimer: this post is written with ChartGPT's help.

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.org). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

System Management by Exception

Popular Post

_