System Management by Exception: Adrian Heald: A simple control chart using Captell Version 6

Sunday, June 3, 2012

Adrian Heald: A simple control chart using Captell Version 6

At the CMG'11 I have met Adrian and asked him to show me how his reporting tool “Captell” (www.reportingservices.com) can be used to build MASF Control Charts. Below is his response.Check also the comment to this post with my feedback.
_____________________________________________________________________________
Introduction

A control chart uses data from a specified period to derive average and upper and lower control values. For this example we are using some CPU utilization data from a UNIX machine collected over the 4 month period January through April 2011 and delivered in a CSV file. The baseline period is January, from which we calculate average values and standard deviations for each clock hour. We can then plot our control chart and compare successive month’s average with the control to see a clear picture of change.

For more information and sample reports see www.reportingservices.com
or contact
Adrian Heald
on +61 (0)411 238 755
adrian@reportingservices.com

Step 1 - Import the CPU utilization data.

The following dialog shows the table definition selecting the “Delimited text file” source type. Specify a name and folder and choose the source type.

Here we see the text file definition, all that is required is the filename and a specification of the date time format.

Step 2 – Import and view the data

This Window shows the main Captell dialog with the task importing the data

And here a view of the imported data; during the import of the data Captell automatically determines correct data types.

Step 3 – Create a query to calculate the base line

This query calculates the average and average +/- 2 standard deviations for data from January.

The query output.

Step 4 – Create a query to summarise single months data

This query calculates the average CPU for each hour throughout the month selected by the Captell parameter ‘Data\Month’.

The query output:

Step 5 – Create a chart to combine the two queries

This chart shows the baseline average CPU utilisation and upper control limit along with the average values from the current month. Captell’s ability to plot data from different sources, in this case the baseline data and the data from the new month makes reporting quite easy. The blue line with the square symbols shows the average hourly data for March, well within the control limit and all hourly values below the baseline average.

Step 6 – Change the parameter to compare a different month

Here we can see the parameter changed to April and the resultant chart. The blue line with the square symbols shows the average hourly data for April, mostly above the upper control limit and all but one hour above the January mean, indicating a substantial increase in utilization.

(Posted with the Adrian's Heald permission)

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.org). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

3 comments:

Igor Trubin said...: That is great!

Couple comments to Adrian:

- You used only 1 month data for base-lining and you grouped data by hour of the day to build Mean, UCL and LCL.

As an actual data to compare with baseline you used other month (selected by the parameter) hourly averaged data (averages).

That works fine if every week day has the similar pattern of CPU usage. It is often not true (depending on application) especially for OLTP apps. Your case can be used if application does the similar (let say batch job) work regardless of week days (weekend or not a weekend – does not really matter). That is possible but rare case.

I do grouping by week hour (168 groups), so 10 am in Sundays is treated differently from 10 am Wednesdays.

- I have never used averaged for the long period (month!) data as an actual data. It is an interesting variation and may have a perfect sense! For exception detection I use just last hours (24 or 168) data point (also averaged but only within particular hour from more granular data). Your way can be used to capturing pattern changes and more applicable for Capacity planning, not for Performance management.... Which is perfectly OK.

Anyway, you proved that your tool can do that in the similar way like other reporting tools do (like BIRT or COGNOS – see how they can be used for building my IT-Control Chart in my blog here, for instance: "Building IT-Control Chart by BIRT against Data from the MySQL Database") and maybe your way is even more convenient... I need to play myself to make any real conclusions/evaluations.; June 03, 2012
Igor Trubin said...: Adrian is responded to my comment:

Thanks for the feedback. This one is a pretty simple example, in a production environment I would add a few more steps, for instance more parameters to be able to select different shifts, say working week, weekends, overnight etc. or I could utilise your method of the 168 groups, which I think I’ll look into a little more anyway, sounds interesting.

The other thing I’ve done when using control charts is to apply the statistical rules, i.e. once the base line is established plot each new days (or hours) value, statistically these should bounce around the mean (i.e. a 50% chance that the value will be above and 50% change below) provided no significant change is occurring. If some significant change starts to occur this can be detected quite quickly, for instance if 3 values in a row above the mean then there is a 87.5% change that this is due to underlying change and not random fluctuations. etc. The probabilities are quite easy to figure out, and it adds some weight to any argument that things are changing. In the past I’ve found this to be a very good early indicator of a degrading system, particularly in an OLTP system like CICS if you can get the response time of each transaction as they occur.; June 03, 2012
Anonymous said...: 5687E6073C
Whatsapp Ücretli Şov
Whatsapp Şov
Telegram Show Kanalı; November 29, 2025

System Management by Exception

Popular Post

_

Sunday, June 3, 2012

Adrian Heald: A simple control chart using Captell Version 6

3 comments:

Post a Comment