System Management by Exception: 2020

Monday, December 28, 2020

My Article: "IT-Control Chart" reached 800 reads (#qualitycontrol #capacitymanagement #performanceengineering)

ABSTRACT: The Control Chart is one of the main Six Sigma tools to optimize business processes. After some adjustments it is used now as visualization tool in IT Capacity Management especially in “behavior learning” products to underline performance and capacity usage anomalies. This review answers the following questions. What is the Control Chart and how to read it and where to use? Review of some performance tools that use it. Control chart types: MASF charts vs. classical SPC; introduction to IT-Control Chart for IT application performance control. How to build a Control Chart using Excel for interactive analysis and R scripting to do it automatically?

The full paper is HERE

Igor Trubin

He started in 1979 as IBM/370 system engineer. In 1986 he got his PhD. in Robotics at St. Petersburg Technical University (Russia) and then worked as a professor teaching CAD/CAM, Robotics for 12 years. He published 30+ papers and made several presentations for conferences related to the Robotics and Artificial Intelligent fields. In 1999 he moved to the US, worked at Capital One bank as a Capacity Planner. His first CMG.org paper was written and presented in 2001. The next one, "Exception Detection System Based on MASF Technique," won a Best Paper award at CMG'02 and was presented at UKCMG'03 in Oxford, England. He made other tech. presentations at IBM z/Series Expo, SPEC.org, Southern and Central Europe CMG and ran several workshops covering his original method of Anomaly and Change Point Detection (Perfomalist.com). Author of “Performance Anomaly Detection” class (at CMG.com). Worked 2 years as the Capacity team lead for IBM, worked for SunTrust Bank for 3 years and then at IBM for 3 years as Sr. IT Architect. Now he works for Capital One bank as IT Manager at the Cloud Engineering and since 2015 he is a member of CMG.org Board of Directors. Runs UT channel iTrubin

Wednesday, December 9, 2020

#AWS #Re:invent - Capacity Planning Use Case

From the following session:

How Capital One manages the health of its applications on AWS

https://virtual.awsevents.com/media/1_566pwp7p

As your applications grow, your resources can scale across multiple accounts and AWS Regions. In this session, learn how you can intelligently automate your applications on AWS using DevOps, machine learning, and support tools from AWS such as AWS Health, AWS Trusted Advisor, Service Quotas, AWS Config, and the AWS Well-Architected Tool. Additionally, hear how Capital One uses AWS Health across accounts to monitor the health of their applications on AWS at scale.

Tipu Qureshi, AWS Speaker :

Igor Trubin

Friday, December 4, 2020

This year at #CMGIMPACT2021, I’ll be a speaker on "Cloud Resources Workload Profiling". Anyone looking to attend please contact me for a discount code, saving 40% off the registration price. #CMGnews

This year at #CMGIMPACT2021, I’ll be a speaker on Cloud Resources Workload Profiling. Anyone looking to attend please contact me for a discount code, saving 40% off the registration price. Find more information on IMPACT 2021 Virtual Conference here: cmgimpact.com/home2021/

Igor Trubin

#CMGnews: "Detection of Performance #Anomaly using DESOM" - #cmgimpact2021 session

Deep Embedded Self Organizing Map (DESOM), a hybrid Deep Neural Network based Autoencoder-Decoder (AE-DE) with an embedded Self Organizing Map (SOM), is applied successfully for the first time to detect anomaly in the performance metrics of mobile network entities with over 94% accuracy. SOM has been widely used in many areas for anomaly detection such as fraud detection, intrusion detection, etc. DESOM is a recent enhancement of SOM but not evaluated as practical solution for real problems prior to this work. Several novel methods to detect concept drift using the intrinsic features of DESOM have been incorporated in the complete solution pipeline.

Speaker
Jayanta Choudhury
Senior Data Scientist
Ericsson Inc.
Santa Clara, California United States

Anila Joshi
Sr. Data Science Manager
Ericsson Inc.
Santa Clara, California United States

Track
Performance Engineering and DevOps

https://cmgimpact.com/detection-of-performance-anomaly-using-desom/

Igor Trubin

Tuesday, December 1, 2020

#AWS #Re:invent virtual conference activities log

I will be putting there Re:invent sessions that could relate to this blog topics...

Machine Learning Keynote (https://virtual.awsevents.com/media/1_07cg4srl)

Join Swami Sivasubramanian for the first-ever Machine Learning Keynote, live at re:Invent. Hear how AWS is freeing builders to innovate on machine learning with the latest developments in AWS machine learning, demos of new technology, and insights from customers.

_______________________________

THURSDAY, DECEMBER 3RD 2020

4:15 PM TO 4:45 PM EST

Productionizing R workloads using Amazon SageMaker, featuring Siemens

https://virtual.awsevents.com/media/t/1_96sfa6x2/186984013

R language and its 16,000+ packages dedicated to statistics and ML are used by statisticians and data scientists in industries such as energy, healthcare, life science, and financial services. Using R, you can run simulations and ML securely and at scale with Amazon SageMaker, while reducing the cost of development by using the fully elastic resources in the cloud. Learn how Siemens Energy, a technology provider for more than one-sixth of the global electricity generation, with more than 91,000 employees and presence in more than 90 countries, is enabling new digital products with Amazon SageMaker to build, train, and deploy statistical and ML models in R at scale.

DEC 1, 2020 | 4:45 PM - 5:15 PM EST (now...) DEC 2, 2020 | 12:45 AM - 1:15 AM EST DEC 2, 2020 | 8:45 AM - 9:15 AM EST [NEW LAUNCH!] Detect abnormal equipment behavior by analyzing sensor data Industrial companies are constantly working to avoid unplanned downtime due to equipment failure and to improve operational efficiency. Over the years, they have invested in physical sensors, data connectivity, data storage, and dashboarding to monitor equipment and get real-time alerts....

__________________

My activities: http://www.trub.in/2020/12/aws-reinvent-virtual-conference.html

DEC 2, 2020 | 2:15 PM - 2:45 PM EST DEC 2, 2020 | 10:15 PM - 10:45 PM EST DEC 3, 2020 | 6:15 AM - 6:45 AM EST [NEW LAUNCH!] Amazon Lookout for Vision Figuring out if a part has been manufactured correctly, or if machine part is damaged, is vitally important. Making this determination usually requires people to inspect objects, which can be slow and error-prone.

_________________________

Dec 2 8pm The most interesting session for me as a cloud capacity manager so far is:

"Optimize compute for performance and cost"

"It’s easier than ever to grow your compute capacity and enable new types of cloud computing applications while maintaining the lowest total cost of ownership (TCO) by blending EC2 Spot Instances, On-Demand Instances, and Savings Plans purchase models. In this session, learn how to use the power of EC2 Fleet with AWS services such as Amazon EC2 Auto Scaling, Amazon ECS, Amazon EKS, Amazon EMR, and AWS Batch to programmatically optimize costs while maintaining high performance and availability. Dive deep into cost-optimization patterns for workloads such as containers, web services, CI/CD, batch, big data, and more."

Most interesting slides from that session:

I plan to look at suggested workshops to learn in details how AWS suggests to do capacity usage/rightsizing and cost optimization for ASG, EMR, ECS, EKS and batch services.

Igor Trubin

#AWS #Re:invent virtual conference activities (I support Capital One's sponsor booth there! )

I hope you enjoy the conference. My presence at Capital One's (platinum sponsor) is to answer questions from the virtual chat at the following time spots: Wednesday, December 2⋅2:30 – 5:00pm Thursday, December 3⋅12:00 – 2:30pm Tuesday, December 8⋅12:00 – 2:30pm Wednesday, December 9⋅2:30 – 5:00pm Wednesday, December 16⋅2:30 – 5:00pm Thursday, December 17⋅2:30 – 5:00pm Thursday, December 17⋅2:30 – 5:00pm ______________________________________ Please come over to our page there and chat! Also please check the list of session related to this blog topic - http://www.trub.in/2020/12/aws-reinvent-virtual-conference_1.html

Igor Trubin

Friday, November 6, 2020

"Cloud Resources Workload Profiling" - my new presentation #cmgimpact2021 (#CMGnews #CloudComputing )

(https://cmgimpact.com/cloud-resources-workload-profiling/)

Abstract: How to be sure a cloud object’s (e.g, AWS EC2, RDS or EBS) workload fits the rightsized resources (Compute, RAM, IO/s and Network traffic)? It is very difficult to do using raw system performance data from monitoring tools. The best way to do that is using a weekly workload profile, which is a graphical visualization in form of MASF IT-Control chart. This chart shows the stability of the workload, reveals the anomalies happened recently, such as run-away, memory leaks or specifically important for cloud objects, the unusual number of hours the object is down all compared with the usual weekly pattern.

This presentation will describe how to build, read, and use workload profiles using real data examples and demonstrates how cloud capacity scaling could be verified.

Igor Trubin

Wednesday, October 7, 2020

"Optimization & Improving Performance to Keep Clouds Light" - another capitalone.com/tech/ post with my contribution

A new article has just been issued with a reference to my previous capitalone.com/tech/ post (see

My External Post on Capitalone.com/tech/ - Optimizing Your Public Cloud for Maximum Efficiency

) and with my short quote:

"...Data Analyst Igor Trubin, explains the mechanics behind data collection. “We found a way to collect adequate cloud usage performance data to automatically recognize the workload patterns of different cloud objects and their subsystems (clusters of servers, databases, containers, disk volumes and networks), including the cost of using them.” Some details of the method for doing that we shared in the following tech blog post: Optimizing Your Public Cloud for Maximum Efficiency..."

Igor Trubin

Wednesday, September 16, 2020

To Work from Home or to Work from Work?

Below are my comments to the following post of my former manager:

https://www.linkedin.com/pulse/i-want-work-from-think-kevin-mclaughlin/?trackingId=WMDA4QXhQ9uJ4H%2BsAJImpg%3D%3D

What do I miss? (working 100% at home)

Whiteboarding. Definitely ! I used to do that a lot:

- explaining my ideas to developers, so they could implement what want not blindly but with passion. Now working from home I spent twice more time via zoom and still not sure I ignite that passion;

- proving my new and innovative concepts to my boss. Now they have to listen or reading my Ruglish, respectively my ability to convince accepting my ideas is declining...

Serendipit ....Less impactful hallway or over-the-cube-wall conversations can also solve problems and create bonds.....

I feel that too. But in DevOps-Adgile-Slack-zoom environment I am getting used to get what I need. My hobby to be always on-line blogging-presenting (Now it is Virtual Convergence/Seminar) helps a lot.

Rhythms. Yes, life circles are changed. But maybe it is for good. At least for me it is not bad to break current routine and establish new ones. With kids growing up it is anyway unavoidable....

Igor Trubin

Tuesday, September 15, 2020

My CMG Video presentation "Catching Anomaly and Normality in Cloud by Neural Net and Entropy Calculation"

Igor Trubin

Saturday, August 22, 2020

CPD - Change Point Detection (#ChangeDetection) is implemented in the free web tool Perfomalist

UPDATE 11/21/2021

The method is implemented as a Perfomalist API: https://www.trutechdev.com/2021/11/the-change-points-detection-perfomalapi.html.

Note there a tuning parameters that corresponds to once explained below:

sValue - Statistical band in %, where 100 is UCL=MAX, 0 is UCL=LCL=mean). - N - normality confidence band;
eValue - Exception Value (EV) threshold in % of actual historical average. - I - model insensitivity;
BaseLineLength - The time period to compare current value against.

_______________________________________

The next version of the Perfomalist (https://www.perfomalist.com/ ) is coming and will include a new functionality - Change Point Detection.

How to find a change in the historical time-series data?

Long ago I have developed a method to do that which is based on EV data (Exception Value - a magnitude of anomalies collected historically).

Idea: any change that occurred first would appear as an anomaly and then become a normality (norm), so collecting and analyzing the severity of all anomalies opens the possibility to find phases in the history with different patterns. To detect that mathematically one just needs to find all roots of the following equation: EV(t)=0 , where t is time. But it is too simple as that might give you too many change points. To control the the sensitivity of detecting change points the method should have some sensitivity tuning parameters, such as following:

N - normality confidence band in percentiles = UCL-LCL (if it is 100%, that means all observations is normal, 0% means all observations abnormal)

I - model insensitivity = EV threshold (if it is 0, that means maximum sensitivity which gives the maximum change points for the given confidence band).

Respectively the more accurate model would be defined by the following equation:

|EV(t,N)|=I or

|EV(t,UCL-LCL)|=I

Where UCL is upper control limit and LCL is lower control limit. Why "||" (absolute value)? To catch two types of changes: going up- and downwards.

How the EV(t) function is defined explained in the following white paper -

(PDF) Exception Based Modeling and Forecasting.

This CPD method has been coded by one of the Performalist developer as a python program and here is a test result:

Currently the approach has been testing and the respectful micro-service (API) has been developing. Stay tuned!

Igor Trubin

Popular Post

_

Monday, December 28, 2020

Wednesday, December 9, 2020

How Capital One manages the health of its applications on AWS

Friday, December 4, 2020

Tuesday, December 1, 2020

Machine Learning Keynote (https://virtual.awsevents.com/media/1_07cg4srl)

Productionizing R workloads using Amazon SageMaker, featuring Siemens

Friday, November 6, 2020

Wednesday, October 7, 2020

My External Post on Capitalone.com/tech/ - Optimizing Your Public Cloud for Maximum Efficiency

Wednesday, September 16, 2020

Tuesday, September 15, 2020

Saturday, August 22, 2020