Popular Post

_

Monday, May 18, 2026

From Robot Grasping to Performance Anomaly Detection: Area of Normal Functioning and Exception Value

Many years ago, my PhD dissertation focused on industrial robot grasping processes and assembly accuracy using passive, sensorless adaptation. The practical problem was simple to describe but difficult to solve: how can a robot successfully grasp or assemble an object when there are inevitable errors in the object’s initial position, orientation, and geometry?

The main idea of that research was to calculate a set of initial conditions under which the grasping or assembly process would still succeed. I called this region the Area of Normal FunctioningANF; in Russian, Область Нормального Функционирования — ОНФ.

In other words, ANF defined the “safe” or “normal” area of operation. If the initial coordinates of the object were inside this area, then passive mechanical adaptation could compensate for small errors and the operation would be successful. If the initial coordinates were outside this area, the process would likely fail.

Looking back, this idea has an interesting connection to my later work in IT performance anomaly detection. In my current research, I use the concept of Exception ValueEV — as the area between statistical limits and the actual observed values of system performance variables.

The domains are very different: one is industrial robotics, the other is IT system performance management. But the underlying idea is surprisingly similar.

In robotic grasping and assembly, the question was:

How far can the object’s actual position deviate from the ideal position while the robot operation still succeeds?

In performance anomaly detection, the question becomes:

How far can the actual value of a performance variable deviate from its statistically expected range before we should treat it as an exception?

In both cases, the main focus is not only the ideal or expected value. The more important question is the boundary between normal and abnormal functioning.

For industrial robots, ANF described the range of physical coordinates where passive adaptation was still able to correct errors. For performance data, EV describes the area where actual behavior moves beyond normal statistical expectations.

This connection is especially interesting because both ideas are based on “management by exception.” We do not need to react to every small deviation. We need to understand when a deviation becomes meaningful — when it leaves the normal functioning area.

Modern robotics research continues to explore related ideas under different terminology: passive compliance, compliant grasping, remote center compliance, sensorless robotic assembly, peg-in-hole insertion, and adaptive manipulation. Many recent methods also use sensors, machine learning, and vision systems. However, the older idea of defining a normal operating region remains relevant: successful automation depends not only on control algorithms, but also on understanding the tolerance zone where the process can still function correctly.

That is why I now see ANF as an early conceptual predecessor of my later EV work. ANF was about the boundary of successful physical operation. EV is about the boundary of normal statistical behavior.

Different fields. Different data. Same engineering mindset:

define the normal area, measure the deviation, and focus attention on meaningful exceptions.


 “conceptual comparison” table


Saturday, April 25, 2026

From AIOps to Agentic Systems: Why Monitoring Is Not Enough (and Never Was)

For years, the industry has been obsessed with observability.

Dashboards. Alerts. Correlations.
Then came AIOps — promising intelligence on top.

But let’s be honest:

Most AIOps tools today are still just better dashboards.

They detect problems.
Sometimes they explain them.
But very rarely do they fix anything.


The Missing Step: Action

Across my (with Capital One and 2 other co-authors) patent family:

  • US10437697 (2016)
  • US11243863 (2019)
  • US12007869 (2021)

there is a deliberate progression:

[Workload] → [Model] → [Insight] → [Action]

Most systems today stop here:

[Workload] → [Model] → [Insight] ❌

The real value starts here:

[Workload] → [Model] → [Insight] → [Action] ✅

Step 1 — Modeling the System (US10437697)

The first patent introduced a core idea:

Model how business activity (transactions) drives system resources (CPU, memory, I/O).

Not thresholds.
Not heuristics.
But statistical relationships.

Transactions ───► CPU / Memory / I/O
(modeled mathematically)

This was already a shift from traditional monitoring.


Step 2 — Adding Context (US11243863)

The second patent introduced interaction types:

Different workloads behave differently — so model them separately.

Mobile ─┐
Web ├──► Separate models ───► Better decisions
ATM ┘

This aligns with what the industry now calls:

  • service-level observability
  • topology-aware analysis

Step 3 — Acting on the Model (US12007869)

This is the key leap.

The latest patent moves beyond analysis:

Use the models to automatically reconfigure the system.

Before:
Workload ───► Overloaded Node

After:
Workload ───► Optimal Node
(automatically reassigned)

Or more formally:

[Model] → Decision → Remap workloads → Optimize system

This is no longer monitoring.

This is autonomous control.


Why This Matters Now (Agentic AI)

Everyone is talking about:

  • AI agents
  • autonomous systems
  • self-healing infrastructure

But here’s the uncomfortable truth:

You can’t have agentic systems without reliable system models.

LLMs don’t understand system dynamics.
They generate text — not operational decisions.

What you need is:

Statistical Models (US10437697)
+ Context Segmentation (US11243863)
+ Autonomous Action (US12007869)

Which leads to:

→ Agentic AIOps

The Real Gap in AIOps Today

Platforms like:

  • Datadog
  • Dynatrace
  • New Relic

are very good at:

✔ Detecting anomalies
✔ Explaining root causes

But still weak at:

❌ Acting autonomously
❌ Continuously optimizing systems


My Take (Provocative Version)

AIOps without action is just observability with better marketing.

The real transition is:

Monitoring → AIOps → Autonomous Systems → Agentic AI Ops

And the key step is exactly what US12007869 enables:

Systems that don’t just understand —
but act based on that understanding.


Final Thought

If your system still depends on humans to:

  • interpret alerts
  • decide what to do
  • execute changes

Then it’s not AIOps.

It’s just monitoring — with extra steps.

______________

Reference:

My CMG presentation about the subject: https://cmg.org/wp-content/plugins/s2member-files/proceedings/2017/362_Trubin.pdf



___________________________________________

Disclaimer:  this post is written with ChartGPT's help. 

One of most recent parent with Capital One (2021 US12007869) is about Autonomous / AI ops (AIOps)

The following patent family:
PatentLevelWhat it protects
2016 (10437697)    Foundation    `Build + validate statistical models
2019 (11243863)    Structured    Segment system into interaction types
2021 (12007869)        Adaptive    Dynamically reconfigure system using models

This progression covers:

✔ Observability / APM tools 

  • Modeling + correlation (Patent 1)

✔ Capacity planning systems

  • Segmented workload modeling (Patent 2)

✔ Autonomous / AI ops (AIOps)

  • Self-optimizing infrastructure (Patent 3)

👉 You effectively moved toward:

self-driving infrastructure based on statistical modeling

Those patents map very directly to modern AIOps, especially the parts around business/workload demand → resource utilization → model scoring → automated load-balancing/remapping.

Core patent family vs AIOps platform features

Patent conceptPlain-English meaningModern AIOps equivalent
Interaction / transaction volume by typeBusiness workload demand, e.g. mobile banking, ATM, web trafficService traffic, request rate, user actions, business events
Device/resource utilizationCPU, memory, disk, network usageInfrastructure + APM telemetry
Statistical / regression / multivariate modelsModel relationship between workload and resource consumptionML baselines, anomaly models, predictive analytics
Diagnostic scoring: R², RMSE, strengthDecide which models are reliableConfidence/scoring of anomalies, correlations, RCA evidence
Filtering weak modelsKeep only useful modelsNoise reduction / alert suppression
ForecastsPredict future demand/resource pressureBottleneck prediction, capacity forecasting
Remapping devices to interaction typesUse model output to change workload placementAutomated remediation, scaling, routing, load balancing

The  strongest overlap is not generic “anomaly detection.” It is business-demand-aware resource modeling that can drive infrastructure decisions.



Friday, April 24, 2026

"Automated Detection of Performance Regressions Using Statistical Process Control Techniques"

Exploring ICPE’12 — A Precedent I Didn’t Expect

I recently came across an interesting paper from ICPE 2012 where my earlier work was cited. It’s always a bit surreal to see your ideas show up in academic research years later—especially in a context that closely aligns with what you’ve been working on.

The Paper

Automated detection of performance regressions using statistical process control techniques

Thanh H.D. Nguyen, Bram Adams, Zhen Ming Jiang, Ahmed E. Hassan
Published by ACM, April 2012

What caught my attention was their discussion of using control charts to detect performance regressions—an approach very close to what I explored back in 2005.

The Connection

In the paper, the authors reference my work:

Trubin et al. [18] proposed the use of control charts for in-field monitoring of software systems where performance counters fluctuate according to input load. Control charts can automatically learn when deviations exceed control limits and alert operators.

They go on to build upon this idea, applying control charts not just to live systems, but to performance regression testing.

Key Idea: Control Charts for Regression Detection

The core concept is elegant:

  • Use historical baseline runs (previous software versions) to establish control limits
  • Compare new test runs against those limits
  • Measure a violation ratio—how often metrics fall outside expected bounds
  • A higher ratio indicates a higher probability of regression

This aligns closely with the fundamental principle I worked on: detecting anomalies not by fixed thresholds, but by statistically learned behavior.

The Real Challenge

The authors correctly highlight a critical difficulty:

We want to detect deviations in the system (the process), not deviations caused by input variability (the load).

This is the central problem in performance analysis—and one that still trips up many modern monitoring systems.

They also point out two assumptions required for traditional control charts:

  1. Stable (non-varying) input
  2. Normally distributed output

In real-world systems, both assumptions are often violated.

Their Solution: Preprocessing

To address this, they introduce preprocessing steps:

  • Scaling – normalizing data to reduce input-driven variance
  • Filtering – cleaning noise before applying control charts

This is a practical adaptation, though it also highlights the limitations of applying classical statistical techniques directly to complex software systems.

Looking Back

For reference, the cited work is:

[18] I. Trubin. Capturing workload pathology by statistical exception detection system.
Computer Measurement Group (CMG), 2005.

It’s interesting to see how the idea of statistical exception detection—especially under variable workloads—continues to evolve and reappear in different forms.

Final Thoughts

What I find most encouraging is that the core idea still holds:

Performance anomalies should be detected relative to expected behavior, not absolute thresholds.

Whether you call it control charts, anomaly detection, or change point analysis—the principle remains the same.

And it’s a good reminder: sometimes ideas don’t just age… they propagate.




Monday, April 20, 2026

#ICPE2026 workshop presentation "Detecting past and future change points in performance data for education and practice"

 See announcement and abstract  HERE



Friday, January 2, 2026

My next patent application is officially published: "SYSTEMS AND METHODS FOR PROACTIVE WORKLOAD MANAGEMENT"