Popular Post

_

Showing posts with label SPC. Show all posts
Showing posts with label SPC. Show all posts

Thursday, April 5, 2012

Prehistory of SEDS: Virtual CMG'90 Trip Report about Control Chart Usage. Part 1.

Using the key word "Control Chart" I have found in the www.CMG.org knowledge base a few very old CMG papers with some discussions about using classical SPC approach against computer performance data.

Here is the first one:

 Fine-Grain Analysis (FGA): A Methodology for Analyzing Intermittent Performance Problems Open in a new window
  By Robert Berry & Jeffrey Hedglin 

 

The paper describes what Mainframe metrics are good to use for Control Charting. They should be two types - a. Performance Quality Measure - sounds like modern KPI... (e.g. response time);  b. System performance metrics (e.g. CPU queue length). Then the paper describes how the intermittent problem could be detected just by plotting SPC Control Charts for both type of metrics in sync (correlated).

I use that approach a lot now, but using MASF type of Control chart and specifically my IT-Control Charts.  BTW I am writing now my next CMG paper and plan to add there a couple very persuasive  examples of correlated IT-Control Charts, such as, number of concurrent user LOGONS vs. number of Ph. CPUs used by LPARS on some p770 AIX frame....

To be continued....

Friday, October 7, 2011

EV-Control Chart

I have introduced the EV meta-metric in 2001 as a measure of anomaly severity. EV stands for Exception Value and more explanation about that idea could be found here:  The Exception Value Concept to Measure Magnitude of Systems Behavior Anomalies 
Basically it is the difference (integral) between actual data and control limits. So far I have used EV data mostly to filter out real issues or for automatic hidden trend recognition. For instance, in my paper CMG’08 “Exception Based Modeling and Forecasting” I have plotted that metric using Excel to explain how it could be used for a new trend starting point recognition. Here is the picture from that paper where EV called “Extra Volume” and for the particular parent metric (CPU util.) it is named ExtraCPUtime:

The EV meta-metric first chart 

But just plotting that meta-metric and/or two their components (EV+ and EV-) over time gives a valuable picture of system behavior. If system is stable that chart should be boring showing near zero value all the time. So using that chart would be very easy (I believe even easier than in MASF Control Charts) to recognize unusual and statistically significant increase or decrease in actual data in very early stage (Early Warning!).

Here is the example of that EV-chart against the same sample data used in few previous posts:
1. Excel example: 

2.  BIRT/MySQL example as a continuation of the exercise from the previous post:

IT-Control chart vs. EV-Chart
Here is the BIRT screenshots that illustrate how that is built:

a.        A. Addition query to get EV calculated written directly in the additional BIRT Data Set object called “Data set for EV Chart”:
SQL query to calculate EV meta-metric
 SQL query to calculate EV metric from the data kept in MySQL table

B. Then additional bar-chart object is added to the report that is bind to that new “Data set for EV Chart”:
Result report is already shown here.





Monday, November 15, 2010

My CMG'10 presentation - "IT-Control Charts"

I will go to CMG conference this time only for one day just to present my paper "IT-Control Charts" on Wednesday December 8th 10:30 - You are WELCOME!

Check it in the CMG conference agenda  - http://www.cmg.org/cgi-bin/agenda_2010.pl?action=more&token=5030

For Russian readers (Информация по русски здесь) I made a posting about that event in my Russian mirror blog: http://ukor.blogspot.com/2010/11/cmg10_15.html

Monday, October 18, 2010

Statistical Process Control to Improve IT Services - one more CMG'10 paper related to this blog subject

Using Statistical Process Control to Improve the Quality and Delivery of IT Services
Nathan Shiffman
Armin Roeseler, Townsend Analytics
Mike Pecak
This session presents a framework for the delivery of IT services based on Continuous Quality Improvement (CQI). Starting with the Capability Maturity Model (CMM), we develop a process oriented approach based on Statistical Process Control (SPC). We apply the framework to the Change Management process of a large IT environment for a trading software firm, and show how failure-rates of the Change Management process were reduced dramatically.

Monday, June 7, 2010

Near-Real-Time IT-Control Chart R-Simulation

UPDATE: Now the following free web tool to build IT-control charts is available:
                           www.Perfomalist.com


See more explanation

Review of IT Control Chart

Wednesday, October 21, 2009

Lower Control Limit Usage Examples for IT Capaciy Management

I have recently posted the following question as LinkedIn discussion subject for "Statistical Process Control" group: "Does it make any sense to use Control Charts for capacity management?" and got one pessimistic comment, which included the following statement:

"...The only situation I can think of using a control chart for capacity is if you had a piece of equipment that if over utilized would cause damage or premature wear in which case you would only have an upper control..."

I disagree. My system (SEDS) has a special part (updated lists) called "Unusual Capacity Usage OUTSIDERS" that can help to capture some serious issues with servers, such as database going down, LPAR migration out of a host and other unusual capacity releases, that  are not necessarily good things:

The following control charts from my up-coming CMG'09 workshop presentation are good illustrations of those type of finding SEDS captures:

1. Vmware host issue (VM migration):



2. Unisys server database is down:




3. Mainframe application unusual low CPU usage:


Thursday, June 7, 2007

System Management by Exception

Greetings!

To keep the discussion about how to Manage computer Systems by Exception (e.g. by  using SPC, APC, MASF, 6-SIGMA, SETDS and other techniques), I run this blog and also publish/present white papers at the www.CMG.org.  Please take a look at the following set of CMG papers related to Statistical Exception Detection System (SEDS or SETDS):

2017 -  The Model Factory - Correlating Server and Database Utilization with Customer Activity"

2016 - Is your Capacity available? 

2012 - SEDS-Lite:  Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data 

2008 Exception Based Modeling and Forecasting

2005 - Capturing Workload Pathology by Statistical Exception Detection System 

2004 - Mainframe Global and Workload Level Statistical Exception Detection System Based on MASF

2003 - Disk Subsystem Capacity Management Based on Business Drivers I/O Performance Metrics and MASF

2002 - Global and Application Levels Exception Detection System, Based on MASF Technique

2001 - Exception Detection System, Based on the Statistical Process Control Concept