Popular Post

_

Showing posts with label IT-control chart. Show all posts
Showing posts with label IT-control chart. Show all posts

Wednesday, March 6, 2013

IT-Control Charts: How to Read, How to Use













DEMO is below:



YouTube version of the presentation:

Sunday, June 3, 2012

Adrian Heald: A simple control chart using Captell Version 6

At the CMG'11 I have met Adrian and asked him to show me how his reporting tool “Captell” (www.reportingservices.com) can be used to build MASF Control Charts. Below is his response.Check also the comment to this post with my feedback.
_____________________________________________________________________________
Introduction


A control chart uses data from a specified period to derive average and upper and lower control values. For this example we are using some CPU utilization data from a UNIX machine collected over the 4 month period January through April 2011 and delivered in a CSV file. The baseline period is January, from which we calculate average values and standard deviations for each clock hour. We can then plot our control chart and compare successive month’s average with the control to see a clear picture of change.

For more information and sample reports see www.reportingservices.com
or contact
Adrian Heald
on +61 (0)411 238 755
adrian@reportingservices.com


Step 1 - Import the CPU utilization data.


The following dialog shows the table definition selecting the “Delimited text file” source type. Specify a name and folder and choose the source type.


Here we see the text file definition, all that is required is the filename and a specification of the date time format.


Step 2 – Import and view the data

This Window shows the main Captell dialog with the task importing the data


And here a view of the imported data; during the import of the data Captell automatically determines correct data types.

Step 3 – Create a query to calculate the base line

This query calculates the average and average +/- 2 standard deviations for data from January.


The query output.

Step 4 – Create a query to summarise single months data

This query calculates the average CPU for each hour throughout the month selected by the Captell parameter ‘Data\Month’.


The query output:

Step 5 – Create a chart to combine the two queries

This chart shows the baseline average CPU utilisation and upper control limit along with the average values from the current month. Captell’s ability to plot data from different sources, in this case the baseline data and the data from the new month makes reporting quite easy. The blue line with the square symbols shows the average hourly data for March, well within the control limit and all hourly values below the baseline average.


Step 6 – Change the parameter to compare a different month

Here we can see the parameter changed to April and the resultant chart. The blue line with the square symbols shows the average hourly data for April, mostly above the upper control limit and all but one hour above the January mean, indicating a substantial increase in utilization.




(Posted with the Adrian's Heald permission)




Monday, May 7, 2012

SEDS-Lite Presentation at Southern CMG Meeting in the SAS Institute

Southern CMGLast Friday I have made my presentation which was announced here: SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data. That was presented at the Southern CMG Meeting in the SAS Institute, Cary, NC. The presentation slides are linked within AGENDA and also can be downloaded from HERE


I plan to write a paper based on this presentation and to submit that to this year CMG'12 conference.


Friday, April 20, 2012

Building IT-Control Chart with COGNOS

I am developing SEDS elements using IBM Cognos. Here is the 1st result, which is just a POC prototype of IT-Control Chart report.
I used the test data (Date-hour stamped utilization metric) that I developed to build the same IT-Control Charts by other tools (BIRT, MySQL, R). I have published some information about that on my previous blog posts. (e.g. R-script to plot IT-Control Chart against MySQL)

This time I have developed simplest meta-data package against ODBC to MySQL database by using Cognos Framework Manager and published that in TCR locally on my Laptop. Then I used Cognos Report Studio to build the report. The result of running the report is following:

I got the same result as I got by using R or BIRT, but I have noticed some nice features in COGNOS that helped me to build that faster and more accurate (e.g. adding the dates at the X-Axis)

I am going to mention that progress with some details on my up-coming SCMG presentation:

SEDS-Lite: Using Open Source Tools (R, BIRT and MySQL) to Report and Analyze Performance Data

UPDATE: I will be presenting that again at CMG'12 conference:  http://itrubin.blogspot.com/2012/08/seds-lite-using-open-source-tools-r.html

Tuesday, March 27, 2012

R-Script to Aggregate (ETL to MySQL) Actual data with Base-line data for IT-Control Charts

At my previous post (R-script to plot IT-Control Chart against MySQL) the task was given to write a R-script to pre-process (ETL) the raw date-hour stamped data to the DATA-cubical format for Control Charting.

Here is the solution:
I have just transformed the already developed SQL script to the RODBC based R-Script which can be seen below:











The result of the script run is the "ActualVsHistorical" table in the servermentrics database on MySQL with the following data that is identical with the data used for plotting IT-Control Chart published in the previous post. The data itself can be seen by just typing the data frame name in the R-Console window:
























So, all main elements of SEDS-lite project were prototyped and published on my posts. Maybe one more task is left, which is to illustrate on R how the exceptional (based on EV meta-metric filtering)  list of objects (servers) can be created as a part of anomalies detection. So far that was done and published in this blog and so far it is only in the "DB2"-like SQL format to run within BIRT. See the post about that here: UCL=LCL : How many standard deviations do we use for Control Charting? Use ZERO!

.

Wednesday, March 21, 2012

R-script to plot IT-Control Chart against MySQL

Continuing playing with the open-source tools to build some SEDS elements, I have developed the simple R-script to plot the IT-Control chart against data stored in MySQL database.

I used the same MySQL data that was already been built and used for IT-Control Charting by BIRT reporting system. See the following post about how that was done: Building IT-Control Chart by BIRT against Data from the MySQL Database. To do that I have used RODBC package to connect and query data from MySQL database through the MySQL ODBC driver.

Actually, I have just slightly modified the R-script which I wrote for my "Power of Control Chart" workshop That script could be found in the following post: IT-Chart: The Best Way to Visualize IT Systems Performance

Here is my new script (click on it to enlarge) :

Here is the result:
 
















which practically identical with what was done by BIRT (see link to BIRT based  picture here).

If you are a programmer you would notice how it is easier to build charts using R versus BIRT (not-for-programmer, menu-based report generator).

The data used for this exercise was already preprocessed to the DATA-cubical format from raw date-hour stamped data (see the SQL script for that here). But what about doing this pre-processing also by R?

That is the next task ... (could be your homework ;). The simplest approach is again to use RODBC package just to run the mentioned above SQL script within R-system. Other and better approach is to do that using the natural R-system data manipulation technique.

Friday, October 7, 2011

EV-Control Chart

I have introduced the EV meta-metric in 2001 as a measure of anomaly severity. EV stands for Exception Value and more explanation about that idea could be found here:  The Exception Value Concept to Measure Magnitude of Systems Behavior Anomalies 
Basically it is the difference (integral) between actual data and control limits. So far I have used EV data mostly to filter out real issues or for automatic hidden trend recognition. For instance, in my paper CMG’08 “Exception Based Modeling and Forecasting” I have plotted that metric using Excel to explain how it could be used for a new trend starting point recognition. Here is the picture from that paper where EV called “Extra Volume” and for the particular parent metric (CPU util.) it is named ExtraCPUtime:

The EV meta-metric first chart 

But just plotting that meta-metric and/or two their components (EV+ and EV-) over time gives a valuable picture of system behavior. If system is stable that chart should be boring showing near zero value all the time. So using that chart would be very easy (I believe even easier than in MASF Control Charts) to recognize unusual and statistically significant increase or decrease in actual data in very early stage (Early Warning!).

Here is the example of that EV-chart against the same sample data used in few previous posts:
1. Excel example: 

2.  BIRT/MySQL example as a continuation of the exercise from the previous post:

IT-Control chart vs. EV-Chart
Here is the BIRT screenshots that illustrate how that is built:

a.        A. Addition query to get EV calculated written directly in the additional BIRT Data Set object called “Data set for EV Chart”:
SQL query to calculate EV meta-metric
 SQL query to calculate EV metric from the data kept in MySQL table

B. Then additional bar-chart object is added to the report that is bind to that new “Data set for EV Chart”:
Result report is already shown here.





Tuesday, October 4, 2011

Building IT-Control Chart by BIRT against Data from the MySQL Database

This is just about another way to build an IT-Control chart assuming the raw data are in the real database like MySQL. In this case some SQL scripting is used.

1. The raw data is CPU hourly utilization and actually the same as in the previous posts: BIRT based Control Chart and One Example of BIRT Data Cubes Usage for Performance Data Analysis. (see the raw data picture here)

2. That raw data need to be uploaded to some table (CPUutil) in the MySQL schema (ServerMetric) by using the following script (sqlScriptToUploadCSVforSEDS.sql):

The uploaded data is seen at the bottom of the picture.

3.       Then the output (result) data (ActualVsHistoric table) is built using the following script (sqlScriptToControlChartforSEDS.sql):
The fragment of the result data are seen at the bottom of the picture also. Everything is ready for building IT-Control Chart and the data is actually the same as used in BIRT based Control Chart, so result should be the same also. Below is more detailed explanation how that was done.

4.  First, using BIRT the connection to MySQL database is established (to MySQLti  with schema  ServerMetrics to table ActualVsHistorical):

5. Then, the chart is developed the same way like that was done in BIRT based Control Chart post:


1.      6. Nice thing is in BIRT you can specify report parameters, that could be then a part of any constants including for filtering (to change a baseline or to provide server or metric names). Finally the report should be run to get the following result, which is almost identical with the one built for BIRT based Control Chart post:




Thursday, September 29, 2011

Power of Control Charts and IT-Chart Concept (Part 1)


This is the video presentation about Control Charts. It is based on my workshop I have already run a few times. It shows how to read and use Control Charts for reporting and analyzing IT systems performance (e.g. servers, applications) . My original IT-(Control) Chart concept within SEDS (Statistical Exception Detection System) is also presented.

The Part 2 will be about "How to build" control chart using R, SAS, BIRT and just 


If anybody interested I would be happy to conduct this workshop again remotely via Internet or in person. Just put a request or just a comment here.



UPDATE: See the version of this presentation with the Russian narration:

Thursday, September 22, 2011

One Example of BIRT Data Cubes Usage for Performance Data Analysis

I have got the comment on my previous post “BIRT based Control Chart“ with questions about how actually in BIRT the data are prepared for Control Charting. Addressing this request I’d like to share how I use BIRT Cube to populate data to CrossTab object which was used then for building a control chart.


As I have already explained in my CMG paper (see IT-Control Chart), the data that describes the  IT-Control Chart (or MASF control chart) has actually 3 dimensions (actually, it has 2 time dimensions and one measurement - metric as seen in the picture at the left). And the control chart is a just a projection to the 2D cut with actual (current or last) data overlaying. So, naturally, the OLAP Cubes data model (Data Cubes) is suitable for grouping and summarizing time stamped data to a crosstable for further analysis including building a control chart. In the past SEDS implementations I did not use Cubes approach and had to transform time stamped data for control charting using basic SAS steps and procs. Now I found that Data Cubes usage is somewhat simpler and in some cases does not require a programming at all if the modern BI tools (such as BIRT) are used.

Below are the some screenshots with comments that illustrates the process of building the IT-Control Chart by using BIRT Cube.



Data source (Input data) is a table with date/hour stamped single metric with at least 4 months history (in this case it is the CPU utilization of some Unix box). That could be in any database format; in this particular example it is the following CSV file:













The result (in the form of BIRT report designer preview) is on the following picture:(Where UCL – Upper Control Limit; LCL is not included for simplicity)

Before building the Cube the three following data sets were built using BIRT “Data Explorer”:
(1) The Reference set or base-line (just “Data Set” on the picture) is based on the input raw data with some filtering and computed columns (weekday and weekhour) and 
(2) the Actual data set which is the same but having the different filter: (raw[“date”} Greater “2011-04-02”)


(3) To combine both data sets for comparing base-line vs. actual, the “Data Set1” is built as a “Joint Data Set” by the following BIRT Query builder:
Then the Data Cube was built in the BIRT Data Cube Builder with the structure shown on the following screen:
Note only one dimension is used here – weekhour as that is needed for Cross table report bellow.

The next step is building report starting with Cross Table (which is picked as an object from BIRT Report designer “Pallete”):
The picture above shows also what fields are chosen from Cube to Cross table.

The final step is dropping “Chart” object from “Palette” and adding UCL calculation using Expression Builder for additional Value (Y) Series:

To see the result one needs just to run the report or to use a "preview' tab on the report designer window:

                FINAL COMMENTS

- The BIRT report package can be exported and submitted for running under any portals (e.g. IBM TCR).
- Additional Cube dimensions makes sense to specify and use, such as server name or/and metric name.
- The report can be designed in BIRT with some parameters. For example, good idea is to use a server name as the report parameter.
- To follow the “SEDS” idea and to have the reporting process based on exceptions, the preliminary exception detection step is needed and can be done again within a  BIRT report using the SQL script similar with published in one of the previous post: