System Management by Exception: Interesting paper about "Adaptive Anomaly Detection in Cloud"

Monday, October 10, 2016

Interesting paper about "Adaptive Anomaly Detection in Cloud"

Adaptive Anomaly Detection in Cloud using Robust and Scalable Principal Component Analysis

by

Bikash Agrawal. Tomasz Wiktorski and Chunming Rong

Abstract

This paper proposes a novel and scalable model for automatic anomaly detection on a large system such as a cloud. Anomaly detection issues early warning of unusual behavior in dynamic environments by learning system characteristic from normal operational data. Anomaly detection in large systems is difficult to detect due heterogeneity, dynamicity, scalability, hidden complexity, and time limitation. To detect anomalous activity in the cloud, we need to monitor the datacenter and collect cloud performance data. In this paper, we propose an adaptive anomaly detection mechanism which investigates principal components of performance metrics. It transforms the performance metrics into a low-rank matrix and then calculates the orthogonal distance using the Robust PCA algorithm. The proposed model updates itself recursively learning and adjusting the new threshold value in order to minimize reconstruction errors. This paper also investigates the robust principal component analysis in distributed environments using Apache Spark as the underlying framework, specifically addressing cases in which a normal operation might exhibit multiple hidden modes. The accuracy and sensitivity of the model is tested on Google data center traces and Yahoo! datasets. The model achieves an 87.24% accuracy.

MY COMMENT: By the way the paper has referenced to MASF technique which I have enhanced and have been using (check my SETDS methodology) for years to capture anomalies (exceptions) and sudden short term trends against huge server farms (20,000+ servers) including private and public clouds. Note my way is much-much simpler and in spite the MASF has indeed a high rate of false positives, SETDS has the way to handle that well.

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

System Management by Exception

Popular Post

_

Monday, October 10, 2016

Interesting paper about "Adaptive Anomaly Detection in Cloud"

Adaptive Anomaly Detection in Cloud using Robust and Scalable Principal Component Analysis

No comments:

Post a Comment