System Management by Exception: August 2011

Tuesday, August 23, 2011

CMG'11 papers about non-statistical ways to capture outliers/anomalies and trends

Monitoring Performance QoS using Outliers
Eugene Margulis, Telus
Commonly used Performance Metrics often measure technical parameters that the end user neither knows nor cares about. The statistical nature of these metrics assumes a known underlying distribution when in reality such distributions are also unknown. We propose a QoS metric that is based on counting the outliers - events when the user is clearly “dis”-satisfied based on his/her expectation at the moment. We use outliers to track long term trends and changes in performance of individual transactions as well as to track system-wide freeze events that indicate system-wide resource exhaustion.

BTW I have already tried to "count" outliers ; see my
2005 paper listed here: http://itrubin.blogspot.com/2007/06/system-management-by-exception.html

I used the SEDS database to count and analyze exceptions:

Introduction to Wavelets and their Application for Computer Performance Trend and Anomaly Detection:
Introduction to wavelets and their application for computer performance analysis. Wavelets are a set of waveforms that can be used to match a signal or noise. There are various families of wavelets unlike Fourier Analysis. Wavelets are stretched(scaled) in time AND frequency and correlated with the signal. The correlation in time and frequency is displayed as a heat map. The color is the intensity, the X axis is the time and the Y axis is the frequency. The heat map shows the time the trends or anamoly starts and when it repeats(frequency).

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

CMG'11 Abstract Report shows my virtual presence

The CMG'11 agenda is online now. The Abstract report shows the following paper related to this blog subject:

1. A Real-World Application of Dynamic Thresholds for Performance Management by Jonathan B Gladstone

He published some material on this blog that most likly is included in his CMG paper:

Jonathan Gladstone: Threshold Management Diagram

Feb 17, 2011

Jonathan Gladstone has worked with a team to implement pro-active Mainframe CPU usage monitoring, basing his design partly on presentations and conversations with Igor Trubin (currently of IBM) and Boris Ginis (of BMC Software).

Here is the abstract form the Abstract report:

The author describes a real application of dynamic thresholds as developed at BMO Financial Group. The case shown uses performance management data from IBM mainframes, but the method would work equally well for detecting deviations from normal patterns in any time-series data including resource utilization in distributed systems, storage, networks or even in non-IT applications such as traffic or health management. This owes much to previous work by well-regarded CMG participants Igor Trubin (currently at IBM), Boris Zibitsker (BEZ Systems) and Boris Ginis (BMC Software).

2. Automatic Daily Monitoring of Continuous Processes in Theory and Practice by Frank Bereznay

Monitoring large numbers of processes for potential issues before they become problematic can be time consuming and resource intensive. A number of statistical methods have been used to identify change due to a discernable cause and separate it from the fluctuations that are part of normal activity. This session provides a case study of creating a system to track and report these types of changes. Determining the best level of data summarization, control limits, and charting options will be examined as well as all of the SAS code needed to implement the process and extend its functionality.

I believe that paper is based on the presentation he did at Southern CA CMG this year, which I have already mentioned in my following post: "The Master of MASF"

I have not written any paper for this year (1st time for the last 10 years!) but I glad that the technology I have been promoting for years still have presented in this year CMG conference with some references to my work!

Igor Trubin

Tuesday, August 16, 2011

"The Master of MASF"

The following paper has been recently presented at Southern California CMG (SCCMG)

Automatic Daily Monitoring of Continuous Processes
Theory and Practice
by

MP Welch – Merrill Consultants

Frank Bereznay - IBM

That is another great paper that promotes the MASF approach in System performance monitoring, which is actually the main subject of this blog. Most likely that paper will be presented again and publish at the international CMG'11 conference.

I am very proud that I was called "The Master of MASF" at that presentation! Thank you, Frank!

Here is the link to the presentation file I have found via google, which has the following pages referencing my work and also this blog:

[PPT]

Automatic Daily Monitoring of Continuous Processes Theory and Practice

The paper also has good references to Ron Kaminski and Dima Seliverstov work. Both authors as well as Frank Bereznay have already been mentioned in this blog already:

See the following posts for Frank Bereznay work:

System Management by Exception: CMG'06: Performance Data ...

Aug 13, 2007

2006 Best Paper Award paper: Did Something Change? Using Statistical Techniques to Interpret Service and Resource Metrics. Frank M. Bereznay, Kaiser Permanente LINK: http://cmg.org/conference/cmg2006/awards/6139.pdf ...

CMG'09: Performance Data Statistical Exceptions Analysis (Review)

Nov 05, 2010

Brian Barnett, Perry Gibson, and Frank Bereznay. That paper has a deep discussion about normality of performance data, showing examples where MASF approach does not work. The Survival Analysis that does not require any knowledge of how...

For Ron Kaminski work:

cmg'08 trip report

Jan 24, 2009

and ron kaminski who expressed some interest in my ev algorithm to capture recent bad trends as that solves some problems of workload pathology recognition on which he has been working recently. so you want to manage your z-series mips?

And for Dima Seliverstov work:

The Exception Value Concept to Measure Magnitude of Systems ...

Dec 10, 2010

At CMG'10 conference I met BMC software specialist Dima Seliverstrov and he mentioned of referencing my 1st CMG'01 paper in his CMG presentation (scheduled to be presented TODAY!). I looked at his paper "Application of Stock Market...

Igor Trubin

System Management by Exception

Popular Post

_

Tuesday, August 23, 2011

CMG'11 papers about non-statistical ways to capture outliers/anomalies and trends

CMG'11 Abstract Report shows my virtual presence

Tuesday, August 16, 2011

"The Master of MASF"

Automatic Daily Monitoring of Continuous Processes Theory and Practice