Popular Post

_

Monday, November 28, 2022

"#Cloud Usage Data. Cleansing, Aggregation, Summarization, Interpretability and Usability" - CMG Impact'23 presentation (#CMGnews)

My presentation was accepted for CMG Impact'23 (www.CMGimpact.com ) conference (Orlando, FL, Feb. 21-23). 

ABSTRACT:

All cloud objects (EC2, RDS, EBS, ECS/Fargate, K8s, Lambda) are elastic and ephemeral.  It is a real problem to understand, analyze and predict their behavior. But it is really needed for Cost optimization and Capacity management.  The essential requirement to do that is the system performance data. The raw data is collected by observability tools (CloudWatch, DataDog or NewRelic), but it is big and messy.

The presentation is to explain and demonstrate:

- How that should be aggregated and summarize addressing the issue of jumping workload from one cluster to another due to rehydration, releases and failovers.

- How the data should/are to be cleaned by anomaly and change point detection without generating false negatives like seasonality.

- How to summarize the data to avoid sinking in granularity. 

- How to interpret the data to do cost and capacity usage assessments.

- Finally how to use that clean, aggregated and summarized data for Capacity Management by using ML/Predictive analytics.






Sunday, November 6, 2022

Hybrid #ChangePointDetection system - #Perfomalist

The paper about using #Perfomalist "Change Point Detection for #MongoDB Time Series Performance Regression" was cited in the following paper: "Estimating Breakpoints in Piecewise Linear Regression Using #MachineLearning Methods", where our method was mentioned as " … offer a hybrid change point detection system..." 

Tuesday, August 23, 2022

CMG'08 Trip Report

Visualization and Analysis of Performance Data using R Jim Holtman Summary: I did not attend this, but that is about free statistical and graphical tool (“R” tool and “S” language http://www.r-project.org/ ). Note: there is interface to SAS dataset function in open lib: http://lib.stat.cmu.edu/S/dataset Functions that define and manipulate S "dataset" objects. A dataset is a matrix whose columns (variables) may be of different data types. Though motivated by a need to interface to SAS, they are useful in any data analysis. There is some function that relates to SPC: JohnsonSystem (http://lib.stat.cmu.edu/S/JohnsonSystem.q) In 2004 he published CMG paper about R usage: The Use of R for System Performance Analysis . See also Lecture: Graphing in R (http://www.ats.ucla.edu/stat/r/library/lecture_graphing_r.htm) or http://ieee.cincinnati.fuse.net/R_IEEE_V2.pdf Major takeaways: That might be a good SAS/Graph replacement. I also think about writing some "S" program to build SEDS type of Control charts to illustrate how that works, for instance THAT COULD BE USED for a workshop similar Mr. Holtman had done. 
  Automating Process Pathology Detection – Rule Engine Design Hints Ron Kaminski Summary: This is about analytical approach to capture pathologies like run-away and memory leaks. BTW Ron referenced my papers as an example of different (statistical) approach to do the same. This is continuation of his previous work in this field: http://www.cmg.org/proceedings/2003/3027.pdf In private conversation he actually expressed some interest to put together both approaches to see how that works from different angles... I am opened. 
  CMG-T: Modeling and Forecasting Speaker: Dr. Michael A. Salsburg Summary: Just a good an overview and tutorial for queuing theory and simulation based modeling and forecasting vs. statistical modeling-forecasting way I presented in my paper.
  eBay - the Shape of Infrastructure to Come Speaker: Paul Strong Summary: Cloud computing is a “Outsourcing 2.0”, sooner or later even banks will use that approach to use capacity on-demand from cloud instead of having own computer farm…. 
  Exception Based Modeling and Forecasting Speaker: Dr. Igor A. Trubin Summary: This is my presentation which was successful and attracted more than 60 attendees. There were a lot of questions and comments during and before this session, positive comments were received from Mark Friedman (After I had to clarify for him 3-D concept of weekly control charts... - my bad ,I was probably not very clear presenting that...) and Ron Kaminski who expressed some interest in my EV algorithm to capture recent bad trends as that solves some problems of workload pathology recognition on which he has been working recently. 
  So You Want to Manage Your z-Series MIPS? Then Detect & Control Application Workload Variance! Speaker: John S. Van Wagenen, Caterpillar Summary: Unfortunately I could not attend this session as I presented mine in the same time. But this paper is about SEDS-like approach to manage Mainframe capacity! And that presentation got prestigious Mullen award! There is a similar paper written by the same author last year: Performance Monitoring Process for Out of Standard Applications Major takeaways: SEDS approach is valid and our implementation on mainframe might be adjusted using this paper methodology. 
  Predicting the Relative Performance of CPU Speaker: Debbie Sheetz Summary: I used similar approach (see my 1st CMG paper and 1st figure in my last paper) in the past and know how challenging is to apply SPEC or other benchmarks to real servers with different configurations. Major takeaways: This paper could be helpful in соме consolidation projects. 
  Panel: Michelson Panel - Visualization Speaker: Jeff Buzen Summary: That was interesting to see deferent ways to present data visually. During this panel discussions I realized that my weekly control charts and especially 3-D version of that are kind of unique. I have even approached Dr. Buzen, with my comments about that… 
  Mainstream NUMA and the TCP/IP stack Speaker: Mark B. Friedman Summary: This is brilliant but very scary paper. Two scary points: A. For multicore servers the speed of memory access could be unpredictable and sometimes deadly slow because of NUMA – non universal memory access. And there are no any metrics or tools to measure that! B. High performance network (1-10 and higher Gb) cannot be fully utilized, because it might consume all CPU cycles only to process network related interrupts. Major takeaways: It’s OK if network interface bandwidth utilization is low. And we should be careful with using modern multicore processors (8 and more cores). 
  Performance and Capacity Management in an Outsourced Environment Speaker: Jeff Hammond Summary: This is very useful information about what we could expect working with outsourced service (people) or if we got outsourced ourselves. It confirms my own experience. Action Items: Be prepared just in case!

Thursday, March 24, 2022

Our poster presentation "SPEC Research — Introducing the #PredictiveAnalytics Working Group" is scheduled at #ICPE2022 #ICPEconf Poster & Demo (Monday - April 11, 2022, 5:15pm)


    https://icpe2022.spec.org/program_files/schedule/

Wednesday, March 16, 2022

I am happy to co-author 2 papers for #ICPE2022 #ICPEconf

Online conference program  https://icpe2022.spec.org/program_files/schedule/  scheduled our following  presentations:

Poster & Demo (Monday - April 11, 2022, 5:15pm )

André Bauer, Mark Leznik, Md Shahriar Iqbal, Daniel Seybold, Igor Trubin, Benjamin Erb, Jörg Domaschka and Pooyan Jamshidi. SPEC Research — Introducing the Predictive Data Analytics Working Group

Data Challenge (Tuesday - April 12,, 4:15pm - 4:55pm)

Md Shahriar Iqbal, Mark Leznik, Igor Trubin, Arne Lochner, Pooyan Jamshidi and André Bauer. Change Point Detection for MongoDB Time Series Performance Regression



Monday, February 28, 2022

"Change Point Detection (#ChangeDetection) for MongoDB Time Series Performance Regression" paper for ACM/SPEC ICPE 2022 Data Challenge Track

The ACM/SPEC ICPE 2022 - Data Challenge Track Committee has decided to ACCEPT our article:

TITLE: Change Point Detection for MongoDB Time Series Performance Regression
AUTHORS: Md Shahriar Iqbal, Mark Leznik, Igor Trubin, Arne Lochner, Pooyan Jamshidi and André Bauer



ABSTRACT
Commits to the MongoDB software repository trigger a collection
of automatically run tests. Here, the identification of commits 
responsible for performance regressions is paramount. Previously, the
process relied on manual inspection of time series graphs to identify
signi￿cant changes, later replaced with a threshold-based detection
system. However, neither system was sufficient for finding changes
in performance in a timely manner. This work describes our recent
implementation of a change point detection system built upon the
Perfomalist approach in combination with XGBoost algorithm. The
algorithm produces a list of change points representing significant
changes from a given history of performance results. We are able
to automatically detect change points and achieve an 83% accuracy,
all while reducing the human effort in the process.

More Perfomalist's  approach details can be found in this blog post:

Wednesday, February 9, 2022

My Cloud Optimization team at #CapitalOne bank won the CMG.org #Innovation Award (#CMGNews)

  https://www.cmg.org/2022/02/capital-one-announced-as-winner-of-the-impact-innovation-award/




Thursday, February 3, 2022

My publications in RG got 5000+ reads

https://www.researchgate.net/profile/Igor-Trubin 



Friday, January 21, 2022

Panel Discussion: Roadmap for Cultivating Performance-Aware Software Engineers

 

"#CloudServers Rightsizing with #Seasonality Adjustments" - my presentation at CMG IMPACT conference (#CMGnews)


Feb 4, 2022 12:15 Virtual at https://cmgimpact.com/sessions-schedule/

Thursday, January 6, 2022

"Performance Anomaly and Change Point Detection for Large-Scale System Management" - my paper published at Springer

 


Intelligent Sustainable Systems pp 403-407Cite as

Performance Anomaly and Change Point Detection for Large-Scale System Management

Conference paper
  • 1Downloads
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 334)

Abstract

The presentation starts with the short overview of the classical statistical process control (SPC)-based anomaly detection techniques and tools including Multivariate Adaptive Statistical Filtering (MASF); Statistical Exception and Trend Detection System (SETDS), Exception Value (EV) meta-metric-based change point detection; control charts; business driven massive prediction and methods of using them to manage large-scale systems such as on-prem servers fleet or massive clouds. Then, the presentation is focused on modern techniques of anomaly and normality detection, such as deep learning and entropy-based anomalous pattern detections.

Keywords

Anomaly detection Change point detection Business driven forecast Control chart Deep Learning Entropy analysis 

References

  1. 1.
    Trubin, I.: Exception based modeling and forecasting. In: Proceedings of Computer Measurement Group (2008)Google Scholar
  2. 2.
    Jeffrey Buzen, F., Annie Shum, S.: MASF—multivariate adaptive statistical filtering. In: Proceedings of Computer Measurement Group (1995)Google Scholar
  3. 3.
    Trubin, I.: Review of IT control chart. CIS J. 4(11), 2079–8407 (2013)Google Scholar
  4. 4.
    Perfomalist Homepage, http://www.perfomalist.com. Last accessed on 10 June 2021
  5. 5.
    Trubin, I., et al.: Systems and methods for modeling computer resource metrics. US Patent 10,437,697 (2016)Google Scholar
  6. 6.
    Trubin, I.: Capturing workload pathology by statistical exception detection. In: Proceedings of Computer Measurement Group (2005)Google Scholar
  7. 7.
    Loboz, C.: Quantifying imbalance in computer systems. In: Proceedings of Computer Measurement Group (2011)Google Scholar