System Management by Exception: The 8th ACM/SPEC on International Conference on Performance Engineering: MASF and Control charts

Monday, May 1, 2017

The 8th ACM/SPEC on International Conference on Performance Engineering: MASF and Control charts

ICPE '17 - conference site.

The following paper has a nice summary of how I use MASF and Control charts and then it has a proposition of similar but improved ...

Technique for Detecting Early-Warning Signals of Performance Deterioration in Large Scale Software Systems
Raghu Ramakrishnan † Tata Consultancy Services Noida, UP, INDIA
Arvinder Kaur † USICT, Guru Gobind Singh Indraprastha University Dwarka, Delhi, INDIA

ABSTRACT The detection of early-warning signals of performance deterioration can help technical support teams in taking swift remedial actions, thus ensuring rigor in production support operations of large scale software systems. Performance anomalies or deterioration, if left unattended, often result in system slowness and unavailability. In this paper, we presents a simple, intuitive and low-overhead technique for recognizing the early warning signs in near real time before they impact the system The technique is based on the inverse relationship which exists between throughput and average response time in a closed system. Because of this relationship, a significant increase in the average system response time causes an abrupt fall in system throughput. To identify such occurrences automatically, Individuals and Moving Range (XmR) control charts are used. We also provide a case study from a real-world production system, in which the technique has been successfully used. The use of this technique has reduced the occurrence of performance related incidents significantly in our daily operations. The technique is tool agnostic and can also be easily implemented in popular system monitoring tools by building custom extensions.

".....The use of control charts, MASF and its variations for monitoring software systems was proposed by Trubin et al. [24][25][26][27]. MASF partitions the time during which the system is operational, into hourly, daily or weekly reference segments to characterize repeatable or similar workload behavior experienced by a software system [8]. For example, the workload encountered by the system on Monday between 9:00 a.m. - 10:00 a.m. may be different from the workload between 10:00 a.m. - 11:00 a.m. Each segment is characterized by its mean and standard deviation. The number of reference sets can be further reduced using clustering techniques. The upper and lower limits are established for each reference at three standard deviations from the mean..."

[24] I. Trubin. Review of IT Control Chart. Journal of Emerging Trends in Computing and Information Sciences, 4(11):857–868, Dec. 2013.
[25] I. Trubin and V. C. Scmg. Capturing workload pathology by statistical exception detection system. In Proceedings of the Computer Measurement Group, 2005.
[26] I. A. Trubin. Global and Application Level Exception Detection System, Based on MASF Technique. In 28th International Computer Measurement Group Conference, December 8-13, 2002, Reno, Nevada, USA, Proceedings, pages 557–566, 2002.
[27] I. A. Trubin and L. Merritt. ”Mainframe Global and Workload Level Statistical Exception Detection System, Based on MASF”. In 30th International Computer Measurement Group Conference, December 5-10, 2004, Las Vegas, Nevada, USA, Proceedings, pages 671–678, 2004.

Full text: http://dl.acm.org/citation.cfm?id=3044533

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

System Management by Exception

Popular Post

_

Monday, May 1, 2017

The 8th ACM/SPEC on International Conference on Performance Engineering: MASF and Control charts

No comments:

Post a Comment