Popular Post

Search This Blog

Monday, August 14, 2017

I will present at #imPACt2017 conference - "The Model Factory - Correlating Server and Database Utilization with Customer Activity"

The abstract and more info can be found here:


Presentation is scheduled:
New Orleans, Louisiana at the 
Loews New Orleans Hotel
Session Number:  362
Subject Area:  CAP
Session Date and Time: 11/8/2017, 2:20 PM-2:50 PM
Room Assignment:  Beauregard

See conference details here: http://cmgimpact.com/ 
Your are welcome to attend!

Thursday, July 27, 2017

Igor = I go R. I have redeveloped SETDS on R = SonR

The 1st attempt to go from SAS to R:

- R script to run in SAS: one more way to built IT-Control Chart

The 1st attempt of SEDS Control Charts to be built using R: 

My proposal to build SETDS on any open-source platforms (including R):

Using RODBC package against MySQL data to build SEDS control charts:

SETDS is actually a 2-level (1. exception and 2. trend detection)  machine learning based anomaly detection method. It competes with other anomaly detection methods that are more and more getting implemented on R:

Finally SETDS was implemented on R and named SonR.

Friday, July 14, 2017

10 years anniversary of running my tech blog - 212th post. SUBSCRIBE!

Time flies. 10 years ago in June 2007 I wrote: 

"To keep the discussion about how to Manage computer Systems by Exception (e.g. by  using SPC, APC, MASF, 6-SIGMA, SETDS and other techniques), I run this blog and also publish/present white papers at the www.CMG.org. ..."

And now I am posting the 212 th post... All in all that makes my dream to publish a monographic book came true and I happy that my "virtual" on-line "book" is visited ~135,000 times: 

You can see the history has ups and downs following my career path. Still plan to keep posting....

Thank you for viewing!

Thursday, June 29, 2017

My CMG'05 papers was cited in PhD Thesis "Finding External Indicators of Load on a Web Server via Analysis of Black-Box Performance Measurements"

  from  MarkLogic Corporation
Thesis for: PhD, Advisor: Dr. Alva Couch


Traditional methods for system performance analysis have long relied on a mix of queuing theory, detailed system knowledge, intuition, and trial-and-error. These approaches often require construction of incomplete gray-box models that can be costly to build and difficult to scale or generalize. In this thesis, we present a black-box analysis method to discover the amount of load on a web server with minimal knowledge of its internal mechanisms. In contrast to white-box analysis, where a system's internal mechanisms can help to explain its behavior, black-box analysis relies on external measurements of a system's reactions to well-understood inputs. The primary advantages of black-box analysis are its relative independence from specific architectures,its applicability to opaque environments (e.g., closed-source systems), and its scalability. In this thesis, we show that statistical analyses of web server response times can be used to discover which server resources are stressed by particular workloads. We also show that under certain conditions, the settling period of server response times after resource perturbation correlates positively with the degree of perturbation. Finally, we use the two-sample Kolmogorov-Smirnov (KS) test to measure statistical equality of multiple samples drawn from response times of a server under various steady-state load conditions. We show that in specific circumstances, the number of samples that test as statistically equal can serve as an imprecise indicator of the amount of load on a server. All of these contributions will aid performance analysis in new environments such as cloud computing, where internal server mechanisms and configurations change dynamically and structural information is hidden from users.

Finding External Indicators of Load on a Web Server via Analysis of Black-Box Performance Measurements. Available from: https://www.researchgate.net/publication/230707525_Finding_External_Indicators_of_Load_on_a_Web_Server_via_Analysis_of_Black-Box_Performance_Measurements [accessed Jun 29, 2017].

Cited:  CMG'2005  paper:

Friday, June 16, 2017

The #DynamicThreshold is common art by now...

I have got the following feedback on the previous post about some capacity management tool from one of the this blog posts author: 

"As far as any similarity to my own work, I think my methods for dynamic thresholds are common art by now… and most certainly derived from Igor’s own presentations that I attended 😊"

So I am proud of making  some influence!

Tuesday, June 6, 2017

Re-posting #CMGamplify - "#DataScience Tools for Infrastructure Operational Intelligence"

The following  CMG Amplify blog post written by Tim Browning is interesting as it underlines what this "System Management by Exception" blog is always about:

"...In order for the performance analyst to attend to troubled systems that may number in the thousands, it is imperative that we filter out of this vast ocean of time series metrics only those events that are anomalous and/or troubling to operational stability. It is too overwhelming to sit and look at thousands of hourly charts and tables. In addition, there is a need for continuous monitoring capability that detects problems immediately or, better yet, predicts them in the near term.  Increasingly, we need self-managing systems that learn and adapt to complex continuous activities and quickly identify the causal reconstruction of threatening conditions as well as recommend solutions (or even automatically deploy remediation events).  Out of necessity, this is where we are heading..."

and in order to  achieve that:

"..In data mining, anomaly detection (also known as outlier detection) is the search for data items in a dataset which do not conform to an expected pattern. Anomalies are also referred to as outliers, change, deviation, surprise, aberrant, peculiarity, intrusion, etc. Most performance problems are anomalies. Probably the most successful techniques (so far) would be Multivariate Adaptive Statistical Filtering (MASF) for detecting statistically extreme conditions and Relative Entropy Monitoring for detecting unusual changes in patterns of activity..."

See entire post here: https://www.cmg.org/2017/06/data-science-tools-infrastructure-operational-intelligence/

Wednesday, May 17, 2017

Eli Hizkiyev: ConicIT – Ground Breaking Technology Unleashed. Sense and Respond? Why Not Predict and Prevent?

ConicIT Summary

Take your existing performance monitoring environment to the next level. ConicIT, a software solution, reads thousands of performance and stability metrics per minute from your performance monitors such as TMON, Omegamon, Mainview, Sysview and others. ConicIT processes and analyzes these metrics with machine learning technology and automatically generates alerts about problems in which even seasoned, professional performance staff may not notice if they look at the same data.

Beyond the automatic analytics and alerts, ConicIT provides an efficient and friendly web-interface which allows you to browse through relevant performance data, in an aggregated way, including watching values and graphs from the very moment a problem occurs. With ConicIT in place, you won’t need to tediously jump between many different monitors or screens. ConicIT aggregates the data from different sources into single view, so you can watch the data easily receiving either high-level or low-level insights into your application performance.

ConicIT also creates important calculated variables. Examples include ratios, summaries, and critical information such as taking the cumulating CPU-time of a job or transaction and calculating the real-time CPU consumption of jobs and transactions. Much of this information is missing from all monitors. The real-time CPU consumption is calculated using the rate in which the CPU-Time rises during each minute.

One of the major advantages of ConicIT is the dynamic alerts which are based on machine learning and statistical algorithms. Traditional monitors offer simple static-alerts based on thresholds. But static alerts are always coming too late and most of them are false-alerts. ConicIT solves this problem with its advanced algorithms. ConicIT automatically studies the typical behavior of each metric every day of the week and every hour of the day. So ConicIT knows (and shows you) the expected range for each performance metric. ConicIT also learns how stable each variable is and how often and how long it may be out of its normal range. Based on this analysis, ConicIT recognizes when there is an anomaly in the system in one or more metrics. In such case ConicIT will send you an alert with information and graphs about the problem. These proactive alerts come much earlier and more accurately than any static-alert type performance system. ConicIT gives you time to solve problems before they affect your end-users, clients and customers.

The combination of early proactive-alerts when the problem started, along with supportive information and graphs, allows you and your team to quickly pinpoint where the problem started and which team should work on resolving it. Thus, ConicIT reduces the required war-rooms for fixing problems and reduces the mean time for repairing problems.

Figure 1: 30 hours graph

Figure 2: It takes a single click (on the left menu) to switch and view any type of information from any point in time

Wednesday, May 3, 2017

The effect of outliers on statistical properties - Anscombe's quartet

Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. He described the article as being intended to attack the impression among statisticians that "numerical calculations are exact, but graphs are rough."[1]

Source: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

(You can easily check this in R by loading the data with data(anscombe).) But what you might not realize is that it's possible to generate bivariate data with a given mean, median, and correlation in any shape you like — even a dinosaur:

Source: The Datasaurus Dozen
Posted: 02 May 2017 08:16 AM PDT
(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Reposting CMG blog posts. imPACt 2017 – What to do in New Orleans (#CMGnews)

Do you plan to attend this year imPACt conference? I do and I am glad it will be in New Orleans!
Founded by the French, ruled by the Spanish, and bought by the US…New Orleans is known for its distinct Creole culture, Jazz, Mardi Gras and many other attributes that give this city a powerful sense of identity.  So, what do you do beyond attending technical sessions?
In upcoming posts we’ll share a few ideas with you.  To get you started, here’s a few museums that may be of interest.
  • The National World War II Museum: The museum tells the story of the American Experience in the war that changed the world (#4 on TripAdvisor Top 10 list of museums in the USA)
  • New Orleans Historic Voodoo Museum:  Well, of course you’re going to find a voodoo museum in NOLA…how could you not?
  • New Orleans Pharmacy Museum:  This does make sense…Louisiana is the birthplace of modern pharmacy and New Orleans resident, Louis Dufilho, was America’s first licensed pharmacist.

Monday, May 1, 2017

The 8th ACM/SPEC on International Conference on Performance Engineering: MASF and Control charts

ICPE '17  - conference site.

The following paper has a nice summary of how I use MASF and Control charts and then it has a proposition of similar but improved ...

Technique for Detecting Early-Warning Signals of Performance Deterioration in Large Scale Software Systems
Raghu Ramakrishnan † Tata Consultancy Services Noida, UP, INDIA
Arvinder Kaur † USICT, Guru Gobind Singh Indraprastha University Dwarka, Delhi, INDIA

ABSTRACT The detection of early-warning signals of performance deterioration can help technical support teams in taking swift remedial actions, thus ensuring rigor in production support operations of large scale software systems. Performance anomalies or deterioration, if left unattended, often result in system slowness and unavailability. In this paper, we presents a simple, intuitive and low-overhead technique for recognizing the early warning signs in near real time before they impact the system The technique is based on the inverse relationship which exists between throughput and average response time in a closed system. Because of this relationship, a significant increase in the average system response time causes an abrupt fall in system throughput. To identify such occurrences automatically, Individuals and Moving Range (XmR) control charts are used. We also provide a case study from a real-world production system, in which the technique has been successfully used. The use of this technique has reduced the occurrence of performance related incidents significantly in our daily operations. The technique is tool agnostic and can also be easily implemented in popular system monitoring tools by building custom extensions. 

".....The use of control charts, MASF and its variations for monitoring software systems was proposed by Trubin et al. [24][25][26][27]. MASF partitions the time during which the system is operational, into hourly, daily or weekly reference segments to characterize repeatable or similar workload behavior experienced by a software system [8]. For example, the workload encountered by the system on Monday between 9:00 a.m. - 10:00 a.m. may be different from the workload between 10:00 a.m. - 11:00 a.m. Each segment is characterized by its mean and standard deviation. The number of reference sets can be further reduced using clustering techniques. The upper and lower limits are established for each reference at three standard deviations from the mean..."

[24] I. Trubin. Review of IT Control Chart. Journal of Emerging Trends in Computing and Information Sciences, 4(11):857–868, Dec. 2013.
[25] I. Trubin and V. C. Scmg. Capturing workload pathology by statistical exception detection system. In Proceedings of the Computer Measurement Group, 2005. 
[26] I. A. Trubin. Global and Application Level Exception Detection System, Based on MASF Technique. In 28th International Computer Measurement Group Conference, December 8-13, 2002, Reno, Nevada, USA, Proceedings, pages 557–566, 2002. 
[27] I. A. Trubin and L. Merritt. ”Mainframe Global and Workload Level Statistical Exception Detection System, Based on MASF”. In 30th International Computer Measurement Group Conference, December 5-10, 2004, Las Vegas, Nevada, USA, Proceedings, pages 671–678, 2004.

Full text: http://dl.acm.org/citation.cfm?id=3044533  

Monday, April 10, 2017

CMG BLOG is on production now!

CMG Amplify - new blog about Capacity and Performance

Saturday, January 7, 2017



I Trubin, M Schutt, J Robinson - US Patent 20,160,379,143, 2016

This disclosure relates generally to system modeling, and more particularly to systems and methods for modeling computer resource metrics. In one embodiment, a processor-implemented computer resource metric modeling method is disclosed. The method may include detecting one or more statistical trends in aggregated interaction data for one or more interaction types, and mapping each interaction type to one or more devices facilitating the transactions. The method may further include generating one or more linear regression models of a relationship between device utilization and interaction volume, and calculating one or more diagnostic statistics for the one or more linear regression models. A subset of the linear regression models may be filtered out based on the one or more diagnostic statistics. One or more forecasts may be generated using the remaining linear regression models, using which a report may be generated and provided.

NB: This patent application uses  statistical exception and trend detection (SETD) data to do modeling.