System Management by Exception: February 2012

Monday, February 27, 2012

Automatic Daily Monitoring of Continuous Processes in Theory and Practice: My CMG'11 Trip Report; Part 3

As I already announced in my following posting: CMG'11 Abstract Report shows my virtual presence another great MASF paper was published on CMG'11 conference:

"Automatic Daily Monitoring of Continuous Processes in Theory and Practice" written and presented by Frank Bereznay & MP Welch.

I have attended the session and here are my comments:

1. Difference from MASF and SPC was stressed. "MASF is a framework and not a detailed statistical method".

2. "... key assumption, our workload is repeatable is some fashion over time. The concept of a repeatable workload is fundamental to any sort of detection testing and needs to be validated before making any investment of time and software into developing a detection system..." That is true!

3. The weekly 168-hour profile was admitted as the best one for MASF analysis:

- the picture from commented paper

I am glad they did that as I moved from the 24-hour profile to this one long ago. See my 2006 paper and here is the IT-Control Chart from that:

So they suggested to have 168 separate (for each hour) group of data (separated reference sets) that exactly technique I had been using since 2006. They stressed, that you need to have at least 5 month of historical data to build that weekly profile adaptive filtering policy. And if you do not have this luxury they describe the way to reduce the number of groups, for instance by separating shifts.

At this point I would slightly disagree. To have hourly summarized 6 month historical data is not a problem anymore in the modern capacity planning processe, especially in Mainframes (they used that platform for demonstration)

4. They published some simple SAS code fragments. I have never did that! But I have started publishing R-codes and SQL scripts as they are more popular (and open sourced) programming systems.

5. They reproduced my favorite IT-Control Chart, but against daily data:

- the picture from commented paper

That is similar with my very early attempt to build a IT-Control chart in the same my 2006 paper:

- that is my 1st Control Chart builder!

But I believe the 168 hourly control chart (I call that IT-Control Chart) is better; in spite it is a bit busy ... See another example below:

6. Some techniques for reduction of false positives were discussed.

I glad they mentioned my way to do that by using EV meta-metric:

"One technique for reducing false positives is to measure the area between under the exception (one of Truben’s techniques) to determine the extent of the deviation. In this case, this exception would not likely warrant review and is common when using the Hourly stigmatization of this data." (I believe they misspelled my name. it is Trubin - not Truben...)

Anyway, they did extensive referencing of some of my papers and even mentioned this blog and I greatly appropriate that!

All in all it is very good paper and presentation!

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

Friday, February 24, 2012

I was the professor at the technical university in Russia - List of courses I taught in 1999

Igor Trubin

Friday, February 17, 2012

Forrester’s “APM and BTM” about CEP - Complex Event Processing

Continuing the previous post subject I looked at another research about APM, which was made a bit earlier in 2010 by Forrester Research, Inc. and called

“Competitive Analysis: Application Performance Management And Business Transaction Monitoring”. The research can be downloaded here.

I found that research also admits importance of usage for APM the “self-learning” related techniques and treated that as a part of CEP - Complex Event Processing.

Based on the research,

“..The Next Step: APM, BTM, BPM, And CEP Converge Complex event processing (CEP) is most probably the first step in the evolution of application performance management. All products reviewed are using some form of statistical-based analysis to distinguish normal from abnormal behavior of applications and transactions. Nastel seems to have taken this analysis one step further by adding a level of inference to its solution. Progress Software has already made the jump into CEP by combining its expertise in BTM and BPM. OpTier recently acquired a solution and announced its intention to enter the advanced field of CEP. SL Corporation, based on its process control automation past, has provided event correlation for a long time, and further integrates with major CEP vendors…”

Below are Vendors that Forester’s research mentioned as having some CEP features (Underlined)

BMC

BPPM Application, Database, and Middleware Monitoring with Analytics monitors transactions running through Web application servers and messaging middleware as well as packaged applications like SAP, Oracle Applications, PeopleSoft, and Siebel CRM. Data collected is automatically integrated with a self-learning analytics engine.

NetIQ

AppManager Performance Profiler is a self-learning, continuously configuring, and continuously adapting technology that profiles dynamic application behavior and sends Trusted Alarms that helps troubleshoot system incidents.

IBM

..(Tivoli) proactively defines autothresholds based on normal behavior.

Nastel Technologies.

AutoPilot CEP integrates events from AutoPilot and third-party monitoring solutions to provide a predictive analysis of application and transaction behavior (normal versus abnormal) and provides a role-based dashboard.

SL Corporation.

RTView Historian allows for persistence of performance metrics via relational databases. The historical data is used for predictive analysis of trends in component and application behavior; historical data provides the ability to create trusted alerts triggered not against fixed thresholds but against dynamically calculated baselines that take into account typical loads during different periods of the workday.

Correlsense

SharePath builds a transaction model for each transaction type to show how it typically utilizes the infrastructure and then creates automatic baselines to provide alerting capabilities and information about a deviation from normal operating tolerances.

Progress Software.

Progress Apama (also part of the RPM Suite) can take information from Actional and perform complex pattern detection activities around it, looking for anomalies that Actional might not otherwise detect. This might include, for example, detecting a cross-correlation between different transactions that might be the root cause of an issue.

Igor Trubin

Wednesday, February 15, 2012

Gartner's Magic Quadrant for Application Performance Monitoring and Behavior Learning Engine

I strongly believe that my SEDS or SETDS (Statistical Exception and Trend Detection System) could be treated as BLE – Behavior Learning Engine. SEDS or SETDS (new name I use now) is not recognized by the following Gartner’s research, but BLE is.

Gartner 2011 research (G00215740) called “Magic Quadrant for Application Performance Monitoring” ( can be downloaded here) admitted that one of the important functionality dimensions of APM is “Applications Performance Analytics” which descried in the research and can be seen in below quotes:

That includes BLE which is indeed the essential component of Application Performance Analytics. The research includes the several Vendors/tools analyses that showed in the Quadrant picture below:

But only the following vendors were indicated in the research as having tools with strong behavior learning features:

ASG

BMC software

CA Technologies

Compuware

IBM

I hope the SETDS implementation offering withing the IBM consulting service (which I currently do) could shift the company in that Magic Quadrant to the right...

Igor Trubin

System Management by Exception

Popular Post

_

Monday, February 27, 2012

Automatic Daily Monitoring of Continuous Processes in Theory and Practice: My CMG'11 Trip Report; Part 3

As I already announced in my following posting: CMG'11 Abstract Report shows my virtual presence another great MASF paper was published on CMG'11 conference:

"Automatic Daily Monitoring of Continuous Processes in Theory and Practice" written and presented by Frank Bereznay & MP Welch.

I have attended the session and here are my comments:

1. Difference from MASF and SPC was stressed. "MASF is a framework and not a detailed statistical method".

2. "... key assumption, our workload is repeatable is some fashion over time. The concept of a repeatable workload is fundamental to any sort of detection testing and needs to be validated before making any investment of time and software into developing a detection system..." That is true!

3. The weekly 168-hour profile was admitted as the best one for MASF analysis:

- the picture from commented paper

I am glad they did that as I moved from the 24-hour profile to this one long ago. See my 2006 paper and here is the IT-Control Chart from that:

At this point I would slightly disagree. To have hourly summarized 6 month historical data is not a problem anymore in the modern capacity planning processe, especially in Mainframes (they used that platform for demonstration)

4. They published some simple SAS code fragments. I have never did that! But I have started publishing R-codes and SQL scripts as they are more popular (and open sourced) programming systems.

5. They reproduced my favorite IT-Control Chart, but against daily data:

- the picture from commented paper

That is similar with my very early attempt to build a IT-Control chart in the same my 2006 paper:

- that is my 1st Control Chart builder!

But I believe the 168 hourly control chart (I call that IT-Control Chart) is better; in spite it is a bit busy ... See another example below:

6. Some techniques for reduction of false positives were discussed.

Friday, February 24, 2012

I was the professor at the technical university in Russia - List of courses I taught in 1999

Friday, February 17, 2012

Forrester’s “APM and BTM” about CEP - Complex Event Processing

Wednesday, February 15, 2012

Gartner's Magic Quadrant for Application Performance Monitoring and Behavior Learning Engine