This blog relates to experiences in the Systems Capacity and Availability areas, focusing on statistical filtering and pattern recognition and BI analysis and reporting techniques (SPC, APC, MASF, 6-SIGMA, SEDS/SETDS and other)
Popular Post
-
I have got the comment on my previous post “ BIRT based Control Chart “ with questions about how actually in BIRT the data are prepared for ...
-
Your are welcome to post to this blog any message related to the Capacity, Performance and/or Availability of computer systems. Just put you...
_
Thursday, September 29, 2011
Power of Control Charts and IT-Chart Concept (Part 1)
This is the video presentation about Control Charts. It is based on my workshop I have already run a few times. It shows how to read and use Control Charts for reporting and analyzing IT systems performance (e.g. servers, applications) . My original IT-(Control) Chart concept within SEDS (Statistical Exception Detection System) is also presented.
The Part 2 will be about "How to build" control chart using R, SAS, BIRT and just
If anybody interested I would be happy to conduct this workshop again remotely via Internet or in person. Just put a request or just a comment here.
UPDATE: See the version of this presentation with the Russian narration:
Labels:
"control chart" SPC R,
Control Chart,
IT-Chart,
IT-control chart,
MASF,
Near-Real-Time IT Control Charts,
SEDS,
workshop
He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin
Friday, September 23, 2011
How To Build IT-Control Chart - Use the Excel Pivot Table!
Continuing the topic of the previous post “One Example of BIRT Data Cubes Usage for Performance Data Analysis” I am showing here the way how to transform raw data to a “SEDS DB” format suitable for IT- Control Chart building or for exception detection. Based on the published on this blog SEDS-lite introduction it is “...building data for charting/detecting” task which is seen on the picture:
But in this case it is strictly manual process (unless someone wants to use VBA to automate that within MS Excel….) and requires the same basically approach as Data Cube/CrossTable usage in BIRT and in MS Excel it is called “PivotTable and PivotChart report“ listed under “data” menu item.
Below are a few screenshots that could help someone who is a bit familiar with EXCEL to understand how to build IT-Control Charts in order to analyze performance data in SEDS terms.
The input data is the same as in the previous post – just date/hour stamped system utilization metric (link to it). Additionally three calculated variables were added: Weekday (using Excel WEEKDAY () function) and weekhour as seen on the next picture:
Then the pivot table was built as shown on the next screenshot against raw data plus calculated weekhour field, which is actually is specified in “row” section of Pivot Table Layout Wizard (it is a bit similar with CrossTable object in BIRT; indeed, the Excel Pivot Table is the another way to work with Data Cubes too!):
Then three other columns were added right next to the pivot table to be able to compare Actual vs. Base-line and calculate Control limits (UCL and LCL). To do that, the “CPU util. Actual” data were referenced from the raw /CPUdata/ sheet where the last week data considered as Actual. Control limits calculation was done by usual spreadsheet formula and the picture shows that formula for UCL.
The last step was to build a chart against the data range, which includes pivot table and those three additional fields. See result IT Control Chart on the final picture:
Do you see where exceptions (anomalies) happened there?
Note that is IT-Control chart where the last day with actual data at the very right last 24 hours on Saturday. So that report made by Excel or BIRT is good to run once a week (e.g. by Sundays before work hours) to get all last week exceptions. To be more dynamic this report should be a bit modified (by adding "refreshing" birder) to run it daily, so minor exception first happened in the Thursday could be captured at least on Friday morning and one could make some proactive measures to avoid overutilization issue the chart shows for Friday and especially Saturday. The most dynamic way is to run that hourly (Excel is not good for that - use BIRT!) to be able to react on the first exception with a few next hours! See live example how that's suppose to be here: http://youtu.be/NTOODZAccvk or here: http://youtu.be/cQ4bk1HNuRk
But in this case it is strictly manual process (unless someone wants to use VBA to automate that within MS Excel….) and requires the same basically approach as Data Cube/CrossTable usage in BIRT and in MS Excel it is called “PivotTable and PivotChart report“ listed under “data” menu item.
Below are a few screenshots that could help someone who is a bit familiar with EXCEL to understand how to build IT-Control Charts in order to analyze performance data in SEDS terms.
The input data is the same as in the previous post – just date/hour stamped system utilization metric (link to it). Additionally three calculated variables were added: Weekday (using Excel WEEKDAY () function) and weekhour as seen on the next picture:
/CPUdata/ sheet |
/PivotForITcontrolChart/ sheet |
The last step was to build a chart against the data range, which includes pivot table and those three additional fields. See result IT Control Chart on the final picture:
Do you see where exceptions (anomalies) happened there?
Note that is IT-Control chart where the last day with actual data at the very right last 24 hours on Saturday. So that report made by Excel or BIRT is good to run once a week (e.g. by Sundays before work hours) to get all last week exceptions. To be more dynamic this report should be a bit modified (by adding "refreshing" birder) to run it daily, so minor exception first happened in the Thursday could be captured at least on Friday morning and one could make some proactive measures to avoid overutilization issue the chart shows for Friday and especially Saturday. The most dynamic way is to run that hourly (Excel is not good for that - use BIRT!) to be able to react on the first exception with a few next hours! See live example how that's suppose to be here: http://youtu.be/NTOODZAccvk or here: http://youtu.be/cQ4bk1HNuRk
By the way, I plan to prepare one another workshop type of presentation to demonstrate the technique discussed in my last posts and also to share actual reports maybe during some CMG.org events in the nearest future...
He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin
Thursday, September 22, 2011
One Example of BIRT Data Cubes Usage for Performance Data Analysis
I have got the comment on my previous post “BIRT based Control Chart“ with questions about how actually in BIRT the data are prepared for Control Charting. Addressing this request I’d like to share how I use BIRT Cube to populate data to CrossTab object which was used then for building a control chart.
As I have already explained in my CMG paper (see IT-Control Chart), the data that describes the IT-Control Chart (or MASF control chart) has actually 3 dimensions (actually, it has 2 time dimensions and one measurement - metric as seen in the picture at the left). And the control chart is a just a projection to the 2D cut with actual (current or last) data overlaying. So, naturally, the OLAP Cubes data model (Data Cubes) is suitable for grouping and summarizing time stamped data to a crosstable for further analysis including building a control chart. In the past SEDS implementations I did not use Cubes approach and had to transform time stamped data for control charting using basic SAS steps and procs. Now I found that Data Cubes usage is somewhat simpler and in some cases does not require a programming at all if the modern BI tools (such as BIRT) are used.
Data source (Input data) is a table with date/hour stamped single metric with at least 4 months history (in this case it is the CPU utilization of some Unix box). That could be in any database format; in this particular example it is the following CSV file:
(2) the Actual data set which is the same but having the different filter: (raw[“date”} Greater “2011-04-02”)
As I have already explained in my CMG paper (see IT-Control Chart), the data that describes the IT-Control Chart (or MASF control chart) has actually 3 dimensions (actually, it has 2 time dimensions and one measurement - metric as seen in the picture at the left). And the control chart is a just a projection to the 2D cut with actual (current or last) data overlaying. So, naturally, the OLAP Cubes data model (Data Cubes) is suitable for grouping and summarizing time stamped data to a crosstable for further analysis including building a control chart. In the past SEDS implementations I did not use Cubes approach and had to transform time stamped data for control charting using basic SAS steps and procs. Now I found that Data Cubes usage is somewhat simpler and in some cases does not require a programming at all if the modern BI tools (such as BIRT) are used.
Below are the some screenshots with comments that illustrates the process of building the IT-Control Chart by using BIRT Cube.
Data source (Input data) is a table with date/hour stamped single metric with at least 4 months history (in this case it is the CPU utilization of some Unix box). That could be in any database format; in this particular example it is the following CSV file:
The result (in the form of BIRT report designer preview) is on the following picture:(Where UCL – Upper Control Limit; LCL is not included for simplicity)
Before building the Cube the three following data sets were built using BIRT “Data Explorer”:
(1) The Reference set or base-line (just “Data Set” on the picture) is based on the input raw data with some filtering and computed columns (weekday and weekhour) and (2) the Actual data set which is the same but having the different filter: (raw[“date”} Greater “2011-04-02”)
(3) To combine both data sets for comparing base-line vs. actual, the “Data Set1” is built as a “Joint Data Set” by the following BIRT Query builder:
Then the Data Cube was built in the BIRT Data Cube Builder with the structure shown on the following screen:
Note only one dimension is used here – weekhour as that is needed for Cross table report bellow.
The next step is building report starting with Cross Table (which is picked as an object from BIRT Report designer “Pallete”):
The picture above shows also what fields are chosen from Cube to Cross table.
The final step is dropping “Chart” object from “Palette” and adding UCL calculation using Expression Builder for additional Value (Y) Series:
To see the result one needs just to run the report or to use a "preview' tab on the report designer window:
FINAL COMMENTS
- The BIRT report package can be exported and submitted for running under any portals (e.g. IBM TCR).
- Additional Cube dimensions makes sense to specify and use, such as server name or/and metric name.
- The report can be designed in BIRT with some parameters. For example, good idea is to use a server name as the report parameter.
- To follow the “SEDS” idea and to have the reporting process based on exceptions, the preliminary exception detection step is needed and can be done again within a BIRT report using the SQL script similar with published in one of the previous post:
He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin
Saturday, September 17, 2011
BIRT based Control Chart
Recently implementing some solution using IBM TCR I have noticed that one of the default reports in TCR/BIRT is a Control Chart in the classical (SPC) version. Looks like that was one of the requirements for ability to build a consistent reports using TCR/BIRT as it's written here: Tivoli Common Reporting Enablement Guide
So I have built a few TCR reports with control chart against Tivoli performance data and that was somewhat useful.
I believe the IT-Control Chart (see my post about that type of control chart here) would give much more value for analyzing time stamped historical data. Is that possible to build using BIRT?
The BIRT is open source free BI tool (can be downloaded from here). I have downloaded and installed that on my laptop and have built a few reports for one of my customers. One of them was to filter out the exceptionally "bad" objects (servers) using EV criteria (see the linked post here).
Then I have built the IT-Control chart using BIRT. Below is the result:
Yes, it is possible with some limitation I have noticed in the current version of BIRT report designer. You could see it if you compare that with oher IT-Control Charts I have build using R (See example here), SAS (Example here) or EXCEL (here).
Anyway, could you see how that chart reports pro-actively on an issue?
So it is another way (not to program like in R or SAS and not to make manually like in EXCEL) to build IT-Control charts. After it is built that could be submitted to TCR (or other reporting portals) to be seen/run on a web.
So I have built a few TCR reports with control chart against Tivoli performance data and that was somewhat useful.
I believe the IT-Control Chart (see my post about that type of control chart here) would give much more value for analyzing time stamped historical data. Is that possible to build using BIRT?
The BIRT is open source free BI tool (can be downloaded from here). I have downloaded and installed that on my laptop and have built a few reports for one of my customers. One of them was to filter out the exceptionally "bad" objects (servers) using EV criteria (see the linked post here).
Then I have built the IT-Control chart using BIRT. Below is the result:
Yes, it is possible with some limitation I have noticed in the current version of BIRT report designer. You could see it if you compare that with oher IT-Control Charts I have build using R (See example here), SAS (Example here) or EXCEL (here).
Anyway, could you see how that chart reports pro-actively on an issue?
So it is another way (not to program like in R or SAS and not to make manually like in EXCEL) to build IT-Control charts. After it is built that could be submitted to TCR (or other reporting portals) to be seen/run on a web.
He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin
Subscribe to:
Posts (Atom)