System Management by Exception: Not a MASF Based Statistical Techniques (Entropy-based) for Anomaly Detection in Data Centers (and Clouds)

Wednesday, October 24, 2012

Not a MASF Based Statistical Techniques (Entropy-based) for Anomaly Detection in Data Centers (and Clouds)

The following papers published on Mendeley criticizes the MASF Gaussian assumption and offer other methods (Tukey and Relative Entropy) to detect anomalies statistically. (BTW I tried to use the entropy analysis to capture performance anomalies - check my other post)

1. Statistical techniques for online anomaly detection in data centers
by Chengwei Wang, Krishnamurthy Viswanathan, Lakshminarayan Choudur, Vanish Talwar, Wade Satterfield, Karsten Schwan

Abstract

Online anomaly detection is an important step in data center management, requiring light-weight techniques that provide sufficient accuracy for subsequent diagnosis and management actions. This paper presents statistical techniques based on the Tukey and Relative Entropy statistics, and applies them to data collected from a production environment and to data captured from a testbed for multi-tier web applications running on server class machines. The proposed techniques are lightweight and improve over standard Gaussian assumptions in terms of performance.

2. Online detection of utility cloud anomalies using metric distributions

by Chengwei Wang Chengwei Wang, V Talwar, K Schwan, P Ranganathan

Abstract

The online detection of anomalies is a vital element of operations in data centers and in utility clouds like Amazon EC2. Given ever-increasing data center sizes coupled with the complexities of systems software, applications, and workload patterns, such anomaly detection must operate automatically, at runtime, and without the need for prior knowledge about normal or anomalous behaviors. Further, detection should function for different levels of abstraction like hardware and software, and for the multiple metrics used in cloud computing systems. This paper proposes EbAT - Entropy-based Anomaly Testing - offering novel methods that detect anomalies by analyzing for arbitrary metrics their distributions rather than individual metric thresholds. Entropy is used as a measurement that captures the degree of dispersal or concentration of such distributions, aggregating raw metric data across the cloud stack to form entropy time series. For scalability, such time series can then be combined hierarchically and across multiple cloud subsystems. Experimental results on utility cloud scenarios demonstrate the viability of the approach. EbAT outperforms threshold-based methods with on average 57.4% improvement in accuracy of anomaly detection and also does better by 59.3% on average in false alarm rate with a `near-optimum' threshold-based method.

3. EbAT : Online Methods for Detecting Utility Cloud Anomalies

Chengwei Wang in Middleware (2009)

4. Performance Metric Selection for Autonomic Anomaly Detection on Cloud Computing Systems

Song Fu in 2011 IEEE Global Telecommunications Conference GLOBECOM 2011 (2011)

5. Mining anomalies using traffic feature distributions

Anukool Lakhina, Mark Crovella, Christophe Diot in ACM SIGCOMM Computer Communication Review (2005)

6. Krishnamurthy Viswanathan, Lakshminarayan Choudur, Vanish Talwar et al. (2012) Ranking Anomalies in Data Centers, 1-8. In NOMS.

7. Greg Eisenhauer, Matthew Wolf, Chengwei Wang (2010) Monalytics : Online Monitoring and Analytics for Managing Large Scale Data Centers. In ICAC.

8. Fast Anomaly Detection for Large Data Centers

Ang Li Ang Li, Lin Gu Lin Gu, Kuai Xu Kuai Xu in 2010 IEEE Global Telecommunications Conference GLOBECOM 2010 (2010)

9.Online Reactive Anomaly Detection over Stream Data

Yan Fu Yan Fu, Jun-Lin Zhou Jun-Lin Zhou, Yue Wu Yue Wu in 2008 International Conference on Apperceiving Computing and Intelligence Analysis (2008)

10.Semantic anomaly detection in online data sources

O Raz, P Koopman, M Shaw in Proceedings of the 24th International Conference on Software Engineering ICSE 2002 (2002)

11.Statistical anomaly detection via httpd data analysis

Daniel Q Naiman in Computational Statistics & Data Analysis (2004)

12.A comparative study of real-valued negative selection to statistical anomaly detection techniques

T Stibor, J Timmis, C Eckert in Comparative and General Pharmacology (2005)

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.org). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

System Management by Exception

Popular Post

_

Wednesday, October 24, 2012

Not a MASF Based Statistical Techniques (Entropy-based) for Anomaly Detection in Data Centers (and Clouds)

Abstract

No comments:

Post a Comment