Popular Post

Wednesday, March 25, 2009

Performance Anomaly ("Perfomaly") Detection. Parts 1-4: Power of Control Charts


_______
Based on my old  workshop (Power of Control Chart), which I ran a few times a several years ago I develop the updated version of it and that will be the  part of  a training course 

Performance Anomalies ("Perfomalies") Detection

That will consist of the following parts:

1. Introduction to Performance Anomaly Detection
2. Detecting performance anomalies by Control Charts - lecture.
3. Building control charts by using Excel - hands-on exercises.
4. Detecting performance anomalies by Control charts using R on cloud server ( AWS ) - hands-on exercises.
            - includes the Instruction video how to build R environment on AWS cloud
5. Detecting Novelties in performance data by using Exception Value (EV) approach (type of “knee” detection) - lecture
6. Detecting Novelties in performance data - hands-on exercises.
7. Detecting normality in the performance workload data by neural nets and deep learning – lecture’
8. Detecting normality by using R and R NN packages - hands on exercises.
9. Detecting anomalous short living objects by using entropy calculation - lecture

10. Detecting anomalous short living objects – hands-on exercises.

So the parts 1-4 is the updated version of my old workshop about:  

- What is the Control Chart? - A little bit of theory and history.
- Where the Control Chart is used: Review of some systems performance tools on a market that built and use control charts.
- How SEDS uses that - MASF charts vs. SPC ones; long gallery of already published charts in the CMG papers plus some new ones with explanations how to read them.
- How to build Control chart: using Excel for interactive analysis and
R to automate the control charts generating with live demonstration of the technique.

- NEW: How to build the R environment (Rstudio) in the cloud (AWS) server (EC2) and using R code to build control charts against your own data. 

Data and R script for testing AWS EC2 with R environment and for the 1st R exercise:

Below is the data in CSV format (supposed to copy to test.csv file) and simple R script to build the monthly profile of some real Unix file system space utilization in form of a monthly Control Chart. 

TEST.SCV

day,CurrentMonthData,UpLimit,Mean,LowLimit
1,0.45,0.54,0.42,0.31
2,0.45,0.54,0.42,0.31
3,0.45,0.54,0.42,0.31
4,0.45,0.54,0.42,0.31
5,0.45,0.54,0.42,0.31
6,0.45,0.53,0.43,0.32
7,0.45,0.54,0.43,0.32
8,0.45,0.54,0.43,0.32
9,0.45,0.53,0.43,0.33
10,0.45,0.53,0.43,0.33
11,0.45,0.53,0.43,0.33
12,0.72,0.53,0.43,0.33
13,0.72,0.53,0.43,0.33
14,0.72,0.53,0.42,0.32
15,0.45,0.53,0.42,0.32
16,0.45,0.55,0.43,0.31
17,0.45,0.55,0.44,0.33
18,1.00,0.54,0.44,0.33
19,0.84,0.54,0.44,0.33
20,0.84,0.54,0.44,0.34
21,0.84,0.54,0.44,0.34
22,,0.54,0.44,0.34
23,,0.52,0.44,0.36
24,,0.52,0.44,0.36
25,,0.51,0.43,0.36
26,,0.66,0.46,0.26
27,,0.66,0.46,0.25
28,,0.62,0.45,0.28
29,,0.62,0.45,0.28
30,,0.54,0.43,0.32
31,,0.54,0.43,0.32


## R script to plot control chart CSV input - I.Trubin
###############################################################
cchrt=read.table('test.csv', header=T, sep=",")

plot(    cchrt[,1],cchrt[,2], type="l",col="black",  ylim=c(0,1),lwd=2,ann=F)

points(cchrt[,1],cchrt[,3],type="l",col="red",       ylim=c(0,1),lwd=1,ann=F)
points(cchrt[,1],cchrt[,4],type="l",col="green",   ylim=c(0,1),lwd=1,ann=F)
points(cchrt[,1],cchrt[,5],type="l",col="blue",     ylim=c(0,1),lwd=1,ann=F)

mtext("# of transactions (K)",      side=2, line=3.0)
mtext("days of month",           side=1, line=3.0)
mtext("CONTROL CHART", side=3, line=1.0)

legend(9,0.3,c("Current Month","UpperLimit","Mean","LowerLimit"),
                col=c("black","red","green","blue"),lwd=c(2,1,1,1),bty="n")
###############################################################


Result is in the picture.

(Other examples posted here: Near-Real-Time IT-Control Charts )


If you would like to attend my workshop - put your contact information to the comment of the post.








Raw time-series  CSV data for the case is below:

MonthlyRawData.csv

date,metric
6/1/2008,0.39
6/2/2008,0.39
6/3/2008,0.39
6/4/2008,0.39
6/5/2008,0.39
6/6/2008,0.39
6/7/2008,0.39
6/8/2008,0.39
6/9/2008,0.39
6/10/2008,0.39
6/11/2008,0.39
6/12/2008,0.39
6/13/2008,0.39
6/14/2008,0.39
6/15/2008,0.39
6/16/2008,0.39
6/17/2008,0.39
6/18/2008,0.39
6/19/2008,0.39
6/20/2008,0.39
6/21/2008,0.39
6/22/2008,0.39
6/23/2008,0.4
6/24/2008,0.4
6/25/2008,0.4
6/26/2008,0.63
6/27/2008,0.63
6/28/2008,0.59
6/29/2008,0.57
6/30/2008,0.37
7/1/2008,0.37
7/2/2008,0.37
7/3/2008,0.37
7/4/2008,0.37
7/5/2008,0.37
7/6/2008,0.38
7/7/2008,0.38
7/8/2008,0.38
7/9/2008,0.39
7/10/2008,0.39
7/11/2008,0.39
7/12/2008,0.39
7/13/2008,0.39
7/14/2008,0.37
7/15/2008,0.37
7/16/2008,0.37
7/17/2008,0.37
7/18/2008,0.37
7/19/2008,0.37
7/20/2008,0.38
7/21/2008,0.38
7/22/2008,0.38
7/23/2008,0.39
7/24/2008,0.39
7/25/2008,0.39
7/26/2008,0.38
7/27/2008,0.37
7/28/2008,0.37
7/29/2008,0.37
7/30/2008,0.37
7/31/2008,0.37
8/1/2008,0.37
8/2/2008,0.37
8/3/2008,0.37
8/4/2008,0.37
8/5/2008,0.37
8/6/2008,0.37
8/7/2008,0.37
8/8/2008,0.37
8/9/2008,0.38
8/10/2008,0.38
8/11/2008,0.38
8/12/2008,0.38
8/13/2008,0.38
8/14/2008,0.38
8/15/2008,0.38
8/16/2008,0.38
8/17/2008,0.45
8/18/2008,0.45
8/19/2008,0.45
8/20/2008,0.45
8/21/2008,0.45
8/22/2008,0.45
8/23/2008,0.46
8/24/2008,0.46
8/25/2008,0.44
8/26/2008,0.44
8/27/2008,0.44
8/28/2008,0.44
8/29/2008,0.44
8/30/2008,0.44
8/31/2008,0.45
9/1/2008,0.45
9/2/2008,0.45
9/3/2008,0.45
9/4/2008,0.45
9/5/2008,0.45
9/6/2008,0.45
9/7/2008,0.45
9/8/2008,0.45
9/9/2008,0.45
9/10/2008,0.45
9/11/2008,0.45
9/12/2008,0.46
9/13/2008,0.46
9/14/2008,0.44
9/15/2008,0.44
9/16/2008,0.44
9/17/2008,0.44
9/18/2008,0.44
9/19/2008,0.44
9/20/2008,0.45
9/21/2008,0.45
9/22/2008,0.45
9/23/2008,0.46
9/24/2008,0.46
9/25/2008,0.45
9/26/2008,0.46
9/27/2008,0.46
9/28/2008,0.45
9/29/2008,0.45
9/30/2008,0.45
10/1/2008,0.45
10/2/2008,0.45
10/3/2008,0.45
10/4/2008,0.45
10/5/2008,0.45
10/6/2008,0.45
10/7/2008,0.45
10/8/2008,0.45
10/9/2008,0.45
10/10/2008,0.45
10/11/2008,0.45
10/12/2008,0.45
10/13/2008,0.45
10/14/2008,0.45
10/15/2008,0.45
10/16/2008,0.45
10/17/2008,0.45
10/18/2008,0.45
10/19/2008,0.45
10/20/2008,0.45
10/21/2008,0.45
10/22/2008,0.45
10/23/2008,0.45
10/24/2008,0.45
10/25/2008,0.45
10/26/2008,0.45
10/27/2008,0.45
10/28/2008,0.45
10/29/2008,0.45
10/30/2008,0.45
10/31/2008,0.45
11/1/2008,0.45
11/2/2008,0.45
11/3/2008,0.45
11/4/2008,0.45
11/5/2008,0.45
11/6/2008,0.45
11/7/2008,0.45
11/8/2008,0.45
11/9/2008,0.45
11/10/2008,0.45
11/11/2008,0.45
11/12/2008,0.45
11/13/2008,0.45
11/14/2008,0.45
11/15/2008,0.45
11/16/2008,0.48
11/17/2008,0.48
11/18/2008,0.48
11/19/2008,0.48
11/20/2008,0.48
11/21/2008,0.48
11/22/2008,0.48
11/23/2008,0.44
11/24/2008,0.44
11/25/2008,0.44
11/26/2008,0.44
11/27/2008,0.44
11/28/2008,0.44
11/29/2008,0.44
11/30/2008,0.45
12/1/2008,0.45
12/2/2008,0.45
12/3/2008,0.45
12/4/2008,0.45
12/5/2008,0.45
12/6/2008,0.45
12/7/2008,0.45
12/8/2008,0.45
12/9/2008,0.45
12/10/2008,0.45
12/11/2008,0.45
12/13/2008,0.46
12/14/2008,0.45
12/15/2008,0.45
12/16/2008,0.45
12/17/2008,0.45
12/18/2008,0.45
12/19/2008,0.45
12/20/2008,0.45
12/21/2008,0.45
12/22/2008,0.45
12/23/2008,0.45
12/24/2008,0.45
12/25/2008,0.45
12/26/2008,0.45
12/27/2008,0.45
12/28/2008,0.45
12/29/2008,0.44
12/30/2008,0.44
12/31/2008,0.44
1/1/2009,0.45
1/2/2009,0.45
1/3/2009,0.45
1/4/2009,0.45
1/5/2009,0.45
1/6/2009,0.45
1/7/2009,0.45
1/8/2009,0.45
1/9/2009,0.45
1/10/2009,0.45
1/11/2009,0.45
1/12/2009,0.45
1/13/2009,0.45
1/14/2009,0.45
1/15/2009,0.45
1/16/2009,0.45
1/17/2009,0.45
1/18/2009,0.45
1/19/2009,0.45
1/20/2009,0.45
1/21/2009,0.45
1/22/2009,0.45
1/23/2009,0.45
1/24/2009,0.45
1/25/2009,0.45
1/26/2009,0.45
1/27/2009,0.45
1/28/2009,0.45
1/29/2009,0.45
1/30/2009,0.45
1/31/2009,0.45
2/1/2009,0.45
2/2/2009,0.45
2/3/2009,0.45
2/4/2009,0.45
2/5/2009,0.46
2/6/2009,0.46
2/7/2009,0.46
2/8/2009,0.46
2/9/2009,0.45
2/10/2009,0.45
2/11/2009,0.45
2/12/2009,0.45
2/13/2009,0.45
2/14/2009,0.45
2/15/2009,0.45
2/16/2009,0.45
2/17/2009,0.45
2/18/2009,0.45
2/19/2009,0.45
2/20/2009,0.45
2/21/2009,0.45
2/22/2009,0.45
2/23/2009,0.45
2/24/2009,0.45
2/25/2009,0.45
2/26/2009,0.45
2/27/2009,0.45
2/28/2009,0.45
3/1/2009,0.45
3/2/2009,0.45
3/3/2009,0.45
3/4/2009,0.45
3/5/2009,0.45
3/6/2009,0.45
3/7/2009,0.45
3/8/2009,0.45
3/9/2009,0.45
3/10/2009,0.45
3/11/2009,0.45
3/12/2009,0.72
3/13/2009,0.72
3/14/2009,0.72
3/15/2009,0.45
3/16/2009,0.45
3/17/2009,0.45
3/18/2009,1
3/19/2009,0.84
3/20/2009,0.84
3/21/2009,0.84

How to transform that data to the profile data (used above to build control chart)?  That and much more are covered by the workshop. SIGN UP!
____________________________
APENDIX: Script to install R, Shiny and Rstudio on AWS EC2 instance. 

#!/bin/bash
#install R
yum install -y R

#install RStudio-Server 1.1.423-x86_64
wget https://download2.rstudio.org/rstudio-server-rhel-1.1.423-x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-1.1.423-x86_64.rpm
rm rstudio-server-rhel-1.1.423-x86_64.rpm

#install shiny and shiny-server (2017-08-25)
R -e "install.packages('shiny', repos='http://cran.rstudio.com/')"
wget https://download3.rstudio.org/centos5.9/x86_64/shiny-server-1.5.4.869-rh5-x86_64.rpm
yum install -y --nogpgcheck shiny-server-1.5.4.869-rh5-x86_64.rpm
rm shiny-server-1.5.4.869-rh5-x86_64.rpm

#add user(s)
useradd username
echo username:username | chpasswd

2 comments:

  1. Hi Igor,
    In your monthly control charts, how are you computing reference set? Are you taking means for each day-of-month from data on smaller intervals of time, i.e. averaging all hours for each day into one mean-per-day-of-month? Then you determine means for previous months for that same day-of-month?
    If so, then the 1st day of the month could be affected by the day-of-week it falls on. How do you adjust for that?

    ReplyDelete
  2. Not exactly , it's comparing the each day of current month with the same days of several previous months daily averages.
    But you are right, that approach contradicts with week days and might compare Sundays with Mondays. But for some metrics with strictly monthly circle (e.g. with month-ends procedures) that could make sense. I took as an example the real disk space data. I plan to publish that spreadsheet with formulas I used on my workshop on my blog and will send you the link.

    ReplyDelete