System Management by Exception
This blog relates to experiences in the Systems Capacity and Availability areas, focusing on statistical filtering and pattern recognition and BI analysis and reporting techniques (SPC, APC, MASF, 6-SIGMA, SEDS/SETDS and other)
Popular Post
-
I have got the comment on my previous post “ BIRT based Control Chart “ with questions about how actually in BIRT the data are prepared for ...
-
Your are welcome to post to this blog any message related to the Capacity, Performance and/or Availability of computer systems. Just put you...
_
Thursday, December 26, 2024
Orthodox Jewish musicians play Beatles (Jerusalem Israel).Gat Brothers -...
Orthodox Jewish musicians play Beatles in the Mamilla mall (Jerusalem Is...
Friday, November 22, 2024
ChatGPT reviewed the paper "Detecting Past and Future Change Points in Performance Data"
Review of the Paper: Detecting Past and Future Change Points in Performance Data
Pros
-
Comprehensive Introduction to SETDS: The paper provides a detailed explanation of the Statistical Exception and Trend Detection System (SETDS) methodology, which includes statistical filtering, pattern recognition, and IT-control charts. This thorough presentation ensures clarity in understanding the core concepts and their applications.
-
Innovative Use of Exception Values (EVs): The introduction of EVs to measure anomalies quantitatively is a strength. The geometric and statistical interpretation of EVs as indicators of upward or downward trends is well-explained and intuitive.
-
Real-World Application via Perfomalist: The practical implementation of SETDS through the Perfomalist tool demonstrates its usability. Its application to the MongoDB dataset showcases its capability to handle real-world performance data challenges effectively.
-
Extensibility of the Methodology: The paper highlights potential enhancements to the Perfomalist API, such as predicting future change points and integrating critical thresholds, indicating forward-thinking and adaptability.
-
Visualization and Examples: The inclusion of IT-control charts, 2D/3D models, and examples of API inputs and outputs enriches the explanation. It provides a tangible understanding of how the system operates.
-
Focus on Performance Engineering Needs: By addressing anomaly detection, trend forecasting, and capacity management, the paper targets practical and high-priority IT challenges.
Cons
-
Limited Comparative Analysis: Although the paper mentions combining SETDS with XGBoost and a voting classifier, it lacks a comprehensive comparison with other existing change point detection methodologies. This limits understanding of its relative advantages and trade-offs.
-
Narrow Scope of Testing: The primary dataset used for evaluation is from MongoDB, which may not represent diverse real-world scenarios. Broader testing across varied datasets and industries could strengthen the claims.
-
Technical Jargon: While thorough, the explanations are heavily technical and may pose a barrier to readers unfamiliar with statistical concepts or performance monitoring.
-
Potential Over-reliance on Assumptions: The use of historical data to establish baselines assumes data stationarity and normality, which may not hold true in all dynamic environments.
-
Limited Exploration of Future Change Point Detection: While the concept of future change point prediction is introduced, its practical implementation and effectiveness remain underexplored.
-
API Accessibility Details: The paper lacks comprehensive information on API performance, scalability, and robustness in high-throughput environments, which are critical for enterprise adoption.
Conclusion
The paper presents a robust framework for detecting and forecasting change points in performance data using SETDS and the Perfomalist tool. Its strengths lie in methodological innovation and practical application. However, a deeper evaluation against competing methods, broader testing, and further elaboration on advanced features would enhance its impact and applicability.
"Detecting Past and Future Change Points in Performance Data" - another SETDS paper was accepted ICTDsC 2024 conference in India
The research paper was accepted for ORAL PRESENTATION at ICTDsC 2024 in India.
The abstract is below.
Wednesday, March 20, 2024
My last role model - Prof. Igor Chelpanov
I just came across an article about him, posthumously, in his blessed memory - IN MEMORY OF IGOR BORISOVICH CHELPANOV (По русски)
It’s interesting that I noticed in this block of mine that he is my last authority HERE, 3 months after his death, which I only found out about now (4 years after..).
He had the greatest talent - Teacher of PhD students. I found him myself after a presentation (about the dynamics of robot grasping devises) made by one of his students, S.N. Kolpashnikov.
This was a turning point in my career (he was my dissertation supervisor) and my entire professional life!
I am immensely grateful to Igor Borisovich and remember him forever!
Friday, December 15, 2023
"Scale in Clouds. What, How, Where, Why and When to Scale" - my new www.CMG.org presentation
Our presentation (with Jignesh Shah) was accepted for www.CMGimpact.com
Scale in Clouds
What, How, Where, Why and When to Scale
Igor Trubin, Jignesh Shah - Capital One bank
ABSTRACT
Presentation includes the following discussion themes.
What to scale: servers, databases, containers, load balancers.
How to scale: horizontally/rightsizing, vertically, manually, automatically, ML based, predictive, serverless.
Where to scale: AWS (ASG,ECS, EKS, ELB), AZURE, GCP, K8s.
Why to scale: cost optimization, incidents avoidance, seasonality.
When to scale: auto-scaling policies and parameters, pre-warming to fight latency, correlating with business/app drivers.
Presentation includes a user case study of scaling parameters optimization: monitoring, modeling and balancing vertical and horizontal scaling, calculating optimal initial/desired cluster size and more.
Tuesday, November 21, 2023
AI!?
Tuesday, September 5, 2023
Me at work.... #datamanagement #rap
Friday, April 21, 2023
My 1st dialog with #chatGBT about R (text data to convert to code to create a #dataframe).
ME: I have CSV table and need a code to convert that to dataframe.
GPT: Sure! Here is some code that will read a CSV file into a dataframe in R:
# set the working directory to the folder where your CSV file is located
setwd("/path/to/folder")
# read the CSV file into a dataframe
mydata <- read.csv("mycsvfile.csv", header = TRUE)
# view the dataframe
mydata
In this code, you will need to replace "/path/to/folder" with the actual path to the folder where your CSV file is located, and "mycsvfile.csv" with the actual name of your CSV file.
The read.csv() function reads a CSV file and creates a dataframe from it. The header = TRUE argument tells R that the first row of the CSV file contains column names.
After running this code, you should have a dataframe called mydata that contains the data from your CSV file. You can view the dataframe by simply typing mydata into the R console.
ME: I know that, but what is I have no ability to use file system? No read.csv, no write.csv, just a coma separated text to code like the following:
first_column <- c("value_1", "value_2", ...)
second_column <- c("value_1", "value_2", ...)
df <- data.frame(first_column, second_column)
GPT: I see! Here's an example of how you could create a dataframe in R from comma-separated text using the format you provided:
lua
Copy code
# sample comma-separated text
text <- "value_1, value_2, value_3
value_4, value_5, value_6
value_7, value_8, value_9"
# split the text into separate lines
lines <- strsplit(text, "\n")[[1]]
# split each line into separate values
values <- lapply(lines, function(x) strsplit(x, ", ")[[1]])
# create a dataframe from the values
df <- data.frame(matrix(unlist(values), nrow=length(lines), byrow=TRUE))
# add column names to the dataframe
colnames(df) <- c("first_column", "second_column", "third_column")
# view the dataframe
In this code, you would need to replace the text variable with your own comma-separated text. The strsplit() function is used to split the text into lines and then into separate values. The lapply() function is used to apply the strsplit() function to each line of the text.
The data.frame() function is used to create a dataframe from the values. The matrix() function is used to reshape the values into a matrix with the appropriate dimensions.
Finally, the colnames() function is used to add column names to the dataframe. You would need to replace the column names in the colnames() function with the appropriate column names for your data.
____
I have tested that and that is not working well.
By googling I got better suggestion:
https://sparkbyexamples.com/r-programming/different-ways-to-create-a-dataframe-in-r/
Tuesday, December 20, 2022
SETDS (Statistical Exception and Trend Detection System) vs. AWS/SETDS (Split, Extract, Test, Decide, and Score). Is OpenAI right?
My SETDS (Statistical Exception and Trend Detection System) method was created about 20 years ago.
Playing with OpenAI https://chat.openai.com/chat I was giving the following information:
"SETDS (Split, Extract, Test, Decide, and Score) is a method for anomaly detection that was developed by researchers at Amazon Web Services (AWS). The method involves the following steps:
Split: The data is split into a training set and a testing set.
Extract: Features are extracted from the training set.
Test: A model is trained on the training set and tested on the testing set.
Decide: Anomaly scores are calculated for each data point in the testing set based on the model's performance.
Score: The anomaly scores are used to identify data points that are likely to be anomalous.
The SETDS method is designed to be fast and efficient, and it has been used in a number of applications, including detecting anomalies in log data and in real-time streaming data. It is one of many methods that can be used for anomaly detection, and the suitability of the method will depend on the specific characteristics of the data and the goals of the analysis."
+++++++++++++++++
Interesting that my SETDS also does some kind of scoring by calculating EV - Exception Value.
Also interesting that I was not able to verify by Google search the information AI bot chat provided....
Friday, December 16, 2022
Cloud Usage Data. Cleansing, Aggregation, Summarization, Interpretability and Usability (#CMGnews) - my presentation
Friday, December 9, 2022
#CMGImpact 2023 conference announcement of the Trubin's presentation about #clouddata
Monday, November 28, 2022
"#Cloud Usage Data. Cleansing, Aggregation, Summarization, Interpretability and Usability" - CMG Impact'23 presentation (#CMGnews)
My presentation was accepted for CMG Impact'23 (www.CMGimpact.com ) conference (Orlando, FL, Feb. 21-23).
ABSTRACT:
All cloud objects (EC2, RDS, EBS, ECS/Fargate, K8s, Lambda) are elastic and ephemeral. It is a real problem to understand, analyze and predict their behavior. But it is really needed for Cost optimization and Capacity management. The essential requirement to do that is the system performance data. The raw data is collected by observability tools (CloudWatch, DataDog or NewRelic), but it is big and messy.
The presentation is to explain and demonstrate:
- How that should be aggregated and summarize addressing the issue of jumping workload from one cluster to another due to rehydration, releases and failovers.
- How the data should/are to be cleaned by anomaly and change point detection without generating false negatives like seasonality.
- How to summarize the data to avoid sinking in granularity.
- How to interpret the data to do cost and capacity usage assessments.
- Finally how to use that clean, aggregated and summarized data for Capacity Management by using ML/Predictive analytics.
Sunday, November 6, 2022
Hybrid #ChangePointDetection system - #Perfomalist
The paper about using #Perfomalist "Change Point Detection for #MongoDB Time Series Performance Regression" was cited in the following paper: "Estimating Breakpoints in Piecewise Linear Regression Using #MachineLearning Methods", where our method was mentioned as " … offer a hybrid change point detection system..."
Tuesday, August 23, 2022
CMG'08 Trip Report
Thursday, March 24, 2022
Our poster presentation "SPEC Research — Introducing the #PredictiveAnalytics Working Group" is scheduled at #ICPE2022 #ICPEconf Poster & Demo (Monday - April 11, 2022, 5:15pm)
Wednesday, March 16, 2022
I am happy to co-author 2 papers for #ICPE2022 #ICPEconf
Online conference program https://icpe2022.spec.org/program_files/schedule/ scheduled our following presentations:
Poster & Demo (Monday - April 11, 2022, 5:15pm )
André Bauer, Mark Leznik, Md Shahriar Iqbal, Daniel Seybold, Igor Trubin, Benjamin Erb, Jörg Domaschka and Pooyan Jamshidi. SPEC Research — Introducing the Predictive Data Analytics Working Group
Data Challenge (Tuesday - April 12,, 4:15pm - 4:55pm)
Md Shahriar Iqbal, Mark Leznik, Igor Trubin, Arne Lochner, Pooyan Jamshidi and André Bauer. Change Point Detection for MongoDB Time Series Performance Regression
Monday, February 28, 2022
"Change Point Detection (#ChangeDetection) for MongoDB Time Series Performance Regression" paper for ACM/SPEC ICPE 2022 Data Challenge Track
UPDATE: the paper were published - (LINK to PAPER)
The ACM/SPEC ICPE 2022 - Data Challenge Track Committee has decided to ACCEPT our article:
TITLE: Change Point Detection for MongoDB Time Series Performance Regression
AUTHORS: Md Shahriar Iqbal, Mark Leznik, Igor Trubin, Arne Lochner, Pooyan Jamshidi and André Bauer
CPD - Change Point Detection (#ChangeDetection) is implemented in the free web tool Perfomalist
Wednesday, February 9, 2022
My Cloud Optimization team at #CapitalOne bank won the CMG.org #Innovation Award (#CMGNews)
Thursday, February 3, 2022
My publications in RG got 5000+ reads
Friday, January 21, 2022
Panel Discussion: Roadmap for Cultivating Performance-Aware Software Engineers
"#CloudServers Rightsizing with #Seasonality Adjustments" - my presentation at CMG IMPACT conference (#CMGnews)
Thursday, January 6, 2022
"Performance Anomaly and Change Point Detection for Large-Scale System Management" - my paper published at Springer
Intelligent Sustainable Systems pp 403-407| Cite as
Performance Anomaly and Change Point Detection for Large-Scale System Management
- 1Downloads
Abstract
The presentation starts with the short overview of the classical statistical process control (SPC)-based anomaly detection techniques and tools including Multivariate Adaptive Statistical Filtering (MASF); Statistical Exception and Trend Detection System (SETDS), Exception Value (EV) meta-metric-based change point detection; control charts; business driven massive prediction and methods of using them to manage large-scale systems such as on-prem servers fleet or massive clouds. Then, the presentation is focused on modern techniques of anomaly and normality detection, such as deep learning and entropy-based anomalous pattern detections.
Keywords
Anomaly detection Change point detection Business driven forecast Control chart Deep Learning Entropy analysisReferences
- 1.Trubin, I.: Exception based modeling and forecasting. In: Proceedings of Computer Measurement Group (2008)Google Scholar
- 2.Jeffrey Buzen, F., Annie Shum, S.: MASF—multivariate adaptive statistical filtering. In: Proceedings of Computer Measurement Group (1995)Google Scholar
- 3.Trubin, I.: Review of IT control chart. CIS J. 4(11), 2079–8407 (2013)Google Scholar
- 4.Perfomalist Homepage, http://www.perfomalist.com. Last accessed on 10 June 2021
- 5.Trubin, I., et al.: Systems and methods for modeling computer resource metrics. US Patent 10,437,697 (2016)Google Scholar
- 6.Trubin, I.: Capturing workload pathology by statistical exception detection. In: Proceedings of Computer Measurement Group (2005)Google Scholar
- 7.Loboz, C.: Quantifying imbalance in computer systems. In: Proceedings of Computer Measurement Group (2011)Google Scholar
Thursday, December 2, 2021
Dynamics of Anomalies or Phases in a Dynamic Object Life
A dynamic object may have following several phases in its lifetime:
1. Initial phase to set a norm - anomalies cannot be detected as there is no baseline sample is established yet. Could be tired later as an outlier.
2. Stable period without any anomalies.
3. Unstable period when anomalies are appearing: suddenly or with gradually increasing rate.
4. Anomalies are introducing a new norm and the rate of anomalies is gradually decreasing.
5. =>2. The next stable period.
6. =>3. … and so on.
To detect those dynamic object phases one can use Anomaly and Change Point detection methods. One of them is SETDS (described in this blog), which has been implementing now as a www.Perfomalist.com tool.
Here is an example how the Perfomalist (Download Input Data Sample) test data is used to detect stable and unstable periods.
Data consists of 28 weeks. To see some dynamic and to catch when anomalies started appearing, the data was divided into 23 data sets.
- The 1st one has 4 initial weeks (initial baseline or reference/learning set) plus following week (1st "current" week).
- The 2nd one has 5 initial weeks as the next (on one week bigger) baseline and following week as the next "current" week.
- The 3rd one... the same mechanism as described above.
Then the www.Perfomalist.com was applied 23 times (could be automated using Personalist APIs) and results were combined into the spreadsheet.
The table and daily summarized charts are below. The result shows clearly 2nd (stable) and 3rd (unstable) phases.
Tuesday, November 23, 2021
Join me with CMG – your technology community – at #CMGIMPACT22. Use code Trubin at cmgimpact.com/ for 50% off IMPACT tickets cmgimpact.com/register/ #cmgnews #technology #InformationTechnology #ITconference #ContinuingEducation #ProfessionalDevelopment
When the cloud servers rightsizing algorithm calculates the baseline level for the current year application server’s usage, the seasonal adjustment needs to be calculated and applied by adding the additional anticipated change, which could be increasing or decreasing the capacity usage. We describe the method and illustrate it against the real data.
The cloud servers rightsizing recommendation generated based on seasonality adjustments, would reflect the seasonal patterns, and prevent any potential capacity issues or reduce an excess capacity.
The ability to keep multi-year historical data of 4 main subsystems of application servers’ capacity usage opens the opportunity to detect seasonality changes and estimate additional capacity needs for CPU, memory, disk I/Os, and network. A multi-subsystem approach is necessary, as very often the nature of the application could be not CPU but I/Os or Memory or Network-intensive.
Applying the method daily allows downsizing correctly if the peak season passes and the available capacity should be decreased, which is a good way to achieve cost savings.
In the session, the detailed seasonality adjustment method is described and illustrated against the real data. The method is based on and developed by the author’s SETDS methodology, which treats the seasonal variation as an exception (anomaly) and calculates adjustments as variations from a linear trend.
Key Takeaways
- How to build seasonal adjustments into the cloud rightsizing
- To get familiar with cloud objects rightsizing techniques