Popular Post

Search This Blog

Tuesday, July 31, 2012

AIX frame and LPAR level Capacity Planning. User Case for Online Banking Application - my new CMG'12 paper

    I have just got acceptance notifications about my two new CMG papers I wrote and submitted for this year CMG'12 conference.
    Below is the abstract of the 1st one which is base on the successful project I had this year.
    AIX frame and LPAR level Capacity Planning. User Case for Online Banking Application
    The paper shares some challenges the Online Banking Capacity Management team had and overcame during the Solaris to AIX migration. The raw capacity estimation model was built to estimate AIX frames capacity needs. The Capacity planning process was adjusted to virtualized environment. The essential system, middleware and database metrics to monitor capacity were identified; business driver correlated forecast reports were built to proactively tune entitlements; IT-Control Charts were created to establish dynamic thresholds for Ph. Processors and IOs usage. Capacity Council was established. 

    The presentation of this paper is scheduled on December 5th, 2012 Wednesday, 9:15:00 AM - 10:15:00 AM in Las Vegas, Nevada (check updates here: http://www.cmg.org/conference/cmg2012/ )
    _____________________________________
    The 2nd paper information is on the next post:
    SEDS-Lite: Using Open Source Tools (R, BIRT, MySQL) to Report and Analyze Performance Data



Thursday, July 12, 2012

Just submitted CMG'12 papers abstracts: Very preliminary analysis

Abstracts are published anonymously here: http://www.cmg.org/cgi-bin/abstract_view.pl 
Apparently one of the  papers was inspired by me: 

Time-Series: Forecasting + Regression: “And” or “Or”?
At CMG’11, I had a fascinating discussion with Dr. I.Trubin. We talked about Uncertainty, Second Law of Thermodynamics, and other high matters in relation to IT. That discussion prompted this paper. We propose a method to get better predictions when we have a forecast of independent variable and a regression. It works for any scenarios where performance can be linked with business metrics. A real-world example is worked through that demonstrates how this technique works to improve the performance metric prediction and highlight trends that would have been overlooked otherwise.
 I guess that relates to my other posting about other paper that use "entropy" :  
Quantifying Imbalance in Computer Systems: CMG'11 Trip Report, Part 2
The following are abstracts of some other papers from the list that potentially could relate to the main topics of this blog. I cannot wait when I can read them!
Methods for Identifying Anomalous Server Behavior
Identifying anomalous server behavior in large server farms is often overlooked for a variety of reasons. The anomalous behavior does not breach alerting thresholds, or perhaps the behavior is subtle and is simply missed. Whatever the case, it is important to identify such behavior before it becomes more severe. In this paper we discuss methods of identifying server behavior that is anomalous or otherwise or uncharacteristic. Methods include statistical techniques such as multidimensional scaling, and machine learning methods such as isolation forests and self organizing maps.

Software Performance Antipatterns for Identifying and Correcting Performance Problems
Performance antipatterns document common software performance problems as well as their solutions. These problems are often introduced during the architectural or design phases of software development, but not detected until later in testing or deployment. Solutions usually require software changes as opposed to system tuning changes. This tutorial covers five performance antipatterns and gives examples to illustrate them. These antipatterns will help developers and performance engineers avoid common performance problems.


Introduction to Wavelets and their Application for Computer Performance Trend and Anomaly Detection
In this paper I will present a technique to identify trends and anomalies in Performance data using wavelets. I will answer the following questions: Why use Wavelets? What are Wavelets? How do I use them?

Application Invariants: Finding constants amidst all the change
This paper presents a method for deriving and utilizing Application Invariants. An Application Invariant is a metric that quantifies the behavior or performance of an application in such a way that its value is immune to changes in workload volume. Several sample Application Invariants are developed and presented. One of the primary benefits of an Application Invariant is that it provides a simple (flat) shape that can readily be used to track changes in application performance or behavior in an automated manner.
Couple other papers could be found there with the obvious interest for this blog.... Will post them later here.

All in all, based on the 1st glance, looks like this year CMG conference (http://www.cmg.org/ ) will have a great success.

Monday, July 2, 2012

Advanced process control (APC) and Fault detection and classification (FDC)

On my post "Virtual CMG'90 Trip Report about Control Chart UsageI have detailed and very interesting response from my 3rd LinkedIn connection Mike Clayton  from Engineering field (not IT at all!). That has a special interest for me as I came from that field originally (my 1st degree is in Engineering) and the SPC concept was originally designed for Engineering application and then adopted for IT via MASF in 1995.


Below is our dialog: 


MIKEUsing normally correlated parameters to detect "loss of correlation" as a fault, for example, is common now in monitoring process tools that have many sensors. Loss of expected correlation ties to actual physical faults, right? Is that one of the things you are finding in your history search? FDC as part of APC which has augmented SPC once we have found adjustment algorithms that can be automated based on output parameters IF the toolset or system passes the FDC check....otherwise, call for help?

____
I know nothing of IT performance metrics...except that most IT departments kow-tow to the Finance department, and not the operations dept. So this past year, our COO took over IT and we have been making great REAL performance progress since then at one of my clients.

But fault-detection is same everywhere I have found, in its multivariate nature, with attention to correlation structure changes.

Roy Maxion at CMU years ago wrote some code in old Xerox printer language (Postscript) that put out green-sheet graphs based on genetic algorithm looking at campus internet traffic.
It was amazingly effective for campus network support technicians.

I think Roy published in JOurnal of Machine Learning over the years. He loved the VAX OS...like me, but was very stubborn about doing anything on Windows OS for long time, so he missed the big money, but he was technically correct of course. I have great respect for Roy.

IBM's Ray Bunfkowski (spelling?) did pioneering work with APC methods at IBM semiconductor operations, and published for Sematech Workshops, and perhaps IEEE.

He often used Svante Wold's Umetrics software, an early pioneer of multivariate methods for engineers. Umetrics has a package called SimcaP I think.  


IGOR: Yes, it is! Looks like I intuitively went to the APC area applying some similar technique to Computer System performance data. Could you point me to any good books or paper about APC/FDC? Starting with basics... 


MIKE: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05458323  
http://www-mtl.mit.edu/researchgroups/Metrology/PAPERS/goodlin-fault-detect-jecs2003.pdf 
http://www.umetrics.com/fabstat  
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5398983&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5398983 

many more on web. Most interesting is to see how FDC and R2R work together now days in modern factories (the first reference above).

I first ran into this FDC issue at Motorola in 1990, and tried methods from vendors as well as universities. CMU's machine learning methods using Genetic Algorithms worked well for continuous processes, but were hard to use for discrete manufacturing where small bursts of data from one lot to the next had to be collected and compared based on start and stop signals without the batch run. Dumbing down from the slow learning but precise models of Neural Nets, to the faster learning and more robust models of Nearest Neighbors was part of my early learning.

Costas Spanos at Berkely, and Dr. Moyne at Univ of Michigan were big help since FDC ratings in realtime were needed to avoid over-adjusting from R2R feedback systems, interrupting to call engineering support before permitting tuning.