Thierry Déléris is a French System Programmer on Mainframe in a team dedicated to performance, metrology & capacity planning. He used some ideas published in Trubin's CMG papers to implement the following:
1. The solution, wich gives a daily eMail by CEC with a spreadsheet by LPAR and Workload, on a daily basis: thresholds are calculated thanks to the R Language by day of the week, hour of the day, LPAR name and WLM Workload, based on a 6 month history data (based on SMF72 records) with exclusion of outliers using Tukey Statistical Method.
2. Then the second part of the solution called STEEDd (Statistical Tool for Enhanced Exceptions Detection and Diagnosis, and as a reference to the "Avenger" British TV Show character John Steed and is legendary bowl hat) was developed using a Java solution to use the same R calculated thresholds but on a 15 minutes control solution, which interacts with BMC Mainview on the Host to collect the current data (In fact the last 15 minutes data). This solution gives a main screen to select the metric to control, and a control screen by metric. An eMail alert is sent to the team if for some metric the result is higher or lower than the target high or low thresholds.
As an example, here is a picture of the control screen used for CPU Metric by Workload & LPAR :
The idea of EV (Extra Value or Exception Value, introduced in Trubin’s CMG papers and discussed in this blog) is used there (Red bars for EV+ and Yellow bars for EV- on above picture) . This helps filtering the right & false negative alerts.
3. Third part of the solution: On the way! An Artificial Intelligence solution based on a rule engine is studied to explore the detected problem by a hierarchical way... This application will be used to enhance the analysis of the metric alerts thanks to an "expert system" way.
(Posted with the Thierry Déléris permission)
Post a Comment